Search Results: "tv"

29 December 2022

Russ Allbery: Review: Sweep of the Heart

Review: Sweep of the Heart, by Ilona Andrews

Series:	Innkeeper Chronicles #6
Publisher:	NYLA Publishing
Copyright:	2022
ISBN:	1-64197-239-4
Format:	Kindle
Pages:	440

Sweep of the Heart is the sixth book of the sci-fi urban fantasy, kitchen-sink-worldbuilding Innkeeper series by husband and wife writing pair Ilona Andrews, assuming one counts the novella Sweep with Me as a full entry (which I do). It's a direct sequel to One Fell Sweep, but also references the events of Sweep of the Blade and Sweep with Me enough to spoil them. Needless to say, don't start here. As always with this series, the book was originally published as a serial on Ilona Andrews's blog. I prefer to read my novels as novels, so I wait until the entries are collected and published, but you can read it on-line for free if you want. Sean and Dina's old friend Wilmos has been kidnapped by an enemy who looks familiar from One Fell Sweep. To get him back, they need to get to a world that is notoriously inaccessible. One player in galactic politics may be able to offer a portal, but it will come as a price. That price? Host a reality TV show. Specifically, a sci-fi version of The Bachelor, with aliens. And the bachelor is the ruler of a galactic empire, whose personal safety is now Dina's responsibility. There is a hand-waving explanation for why the Seven Star Dominion does spouse selection for their rulers this way, but let's be honest: it's a fairly transparent excuse to write a season of The Bachelor with strange aliens, political intrigue, inn-generated special effects and wallpaper-worthy backdrops, ulterior motives, and attempted murder. Oh, and competence porn, as Dina once again demonstrates just how good she's become at being an innkeeper. I'm not much of a reality TV fan, have never watched The Bachelor, and still thoroughly enjoyed this. It helps that the story is more about political intrigue than it is about superficial attraction or personal infighting, and the emperor at the center of the drama is calm, thoughtful, and juggling a large number of tricky problems (which Dina, somewhat improbably, becomes privy to). The contestants range from careful diplomats with hidden political goals to eye candy with the subtlety of a two by four, the latter sponsored by sentient murderous trees, so there's a delightful variety of tone and a ton of narrative momentum. A few of the twists and turns were obvious, but some of the cliches are less cliched than they initially look. This series always leans towards "play with every toy in the toy box at once!" rather than subtle and realistic. This entry is no exception, but the mish-mash of science fiction tropes with nigh-unlimited fantasy power is, as usual, done with so much verve and sheer creative joy that I can't help but love it. We do finally learn Caldenia's past, and... I kind of wish we hadn't? Or at least that her past had been a bit more complicated. I will avoid spoiling it by saying too much, but I thought it was an oddly flat and overdone trope that made Caldenia substantially less interesting than she was before this revelation. That was one mild disappointment. The other is that the opening of Sweep of the Heart teases some development of the overall series plot, but that remains mostly a tease. Wilmos's kidnapping and any relevance to deeper innkeeper problems is, at least in this entry, merely a framing story for the reality TV show that constitutes the bulk of the novel. There are a few small revelations in the conclusion, but only the type that raise more questions. Hopefully we'll get more series plot development in the next book, but even if we don't, I'm happily along for the ride. If you like this series, this is more of the thing you already like. If you haven't read it yet, I highly recommend it (start with Clean Sweep). It's not great literature, and most of the trappings will be familiar from a dozen other novels and TV shows, but it's unabashed fun with loads of competence porn and a wild internal logic that grows on you over time. Also, it has one of the most emotionally satisfying sentient buildings in SF. There will, presumably, be more entries in the series, but they have not yet been announced. Rating: 8 out of 10

18 December 2022

Ian Jackson: Rust needs #[throws]

tl;dr: Ok-wrapping as needed in today s Rust is a significant distraction, because there are multiple ways to do it. They are all slightly awkward in different ways, so are least-bad in different situations. You must choose a way for every fallible function, and sometimes change a function from one pattern to another. Rust really needs #[throws] as a first-class language feature. Code using #[throws] is simpler and clearer. Please try out withoutboats s fehler. I think you will like it. Contents

A recent personal experience in coding style Ever since I read withoutboats s 2020 article about fehler, I have been using it in most of my personal projects. For Reasons I recently had a go at eliminating the dependency on fehler from Hippotat. So, I made a branch, deleted the dependency and imports, and started on the whack-a-mole with the compiler errors. After about a half hour of this, I was starting to feel queasy. After an hour I had decided that basically everything I was doing was making the code worse. And, bizarrely, I kept having to make individual decisons about what idiom to use in each place. I couldn t face it any more. After sleeping on the question I decided that Hippotat would be in Debian with fehler, or not at all. Happily the Debian Rust Team generously helped me out, so the answer is that fehler is now in Debian, so it s fine. For me this experience, of trying to convert Rust-with-#[throws] to Rust-without-#[throws] brought the Ok wrapping problem into sharp focus. What is Ok wrapping? Intro to Rust error handling (You can skip this section if you re already a seasoned Rust programer.) In Rust, fallibility is represented by functions that return Result<SuccessValue, Error>: this is a generic type, representing either whatever SuccessValue is (in the Ok variant of the data-bearing enum) or some Error (in the Err variant). For example, std::fs::read_to_string, which takes a filename and returns the contents of the named file, returns Result<String, std::io::Error>. This is a nice and typesafe formulation of, and generalisation of, the traditional C practice, where a function indicates in its return value whether it succeeded, and errors are indicated with an error code. Result is part of the standard library and there are convenient facilities for checking for errors, extracting successful results, and so on. In particular, Rust has the postfix ? operator, which, when applied to a Result, does one of two things: if the Result was Ok, it yields the inner successful value; if the Result was Err, it returns early from the current function, returning an Err in turn to the caller. This means you can write things like this:

    let input_data = std::fs::read_to_string(input_file)?;

and the error handling is pretty automatic. You get a compiler warning, or a type error, if you forget the ?, so you can t accidentally ignore errors. But, there is a downside. When you are returning a successful outcome from your function, you must convert it into a Result. After all, your fallible function has return type Result<SuccessValue, Error>, which is a different type to SuccessValue. So, for example, inside std::fs::read_to_string, we see this:

        let mut string = String::new();
        file.read_to_string(&mut string)?;
        Ok(string)

string has type String; fs::read_to_string must return Result<String, ..>, so at the end of the function we must return Ok(string). This applies to return statements, too: if you want an early successful return from a fallible function, you must write return Ok(whatever). This is particularly annoying for functions that don t actually return a nontrivial value. Normally, when you write a function that doesn t return a value you don t write the return type. The compiler interprets this as syntactic sugar for -> (), ie, that the function returns (), the empty tuple, used in Rust as a dummy value in these kind of situations. A block ( ... ) whose last statement ends in a ; has type (). So, when you fall off the end of a function, the return value is (), without you having to write it. So you simply leave out the stuff in your program about the return value, and your function doesn t have one (i.e. it returns ()). But, a function which either fails with an error, or completes successfuly without returning anything, has return type Result<(), Error>. At the end of such a function, you must explicitly provide the success value. After all, if you just fall off the end of a block, it means the block has value (), which is not of type Result<(), Error>. So the fallible function must end with Ok(()), as we see in the example for std::fs::read_to_string. A minor inconvenience, or a significant distraction? I think the need for Ok-wrapping on all success paths from fallible functions is generally regarded as just a minor inconvenience. Certainly the experienced Rust programmer gets very used to it. However, while trying to remove fehler s #[throws] from Hippotat, I noticed something that is evident in codebases using vanilla Rust (without fehler) but which goes un-remarked. There are multiple ways to write the Ok-wrapping, and the different ways are appropriate in different situations. See the following examples, all taken from a real codebase. (And it s not just me: I do all of these in different places, - when I don t have fehler available - but all these examples are from code written by others.) Idioms for Ok-wrapping - a bestiary Wrap just a returned variable binding If you have the return value in a variable, you can write Ok(reval) at the end of the function, instead of retval.

    pub fn take_until(&mut self, term: u8) -> Result<&'a [u8]>  
        // several lines of code
        Ok(result)

If the returned value is not already bound to variable, making a function fallible might mean choosing to bind it to a variable. Wrap a nontrivial return expression Even if it s not just a variable, you can wrap the expression which computes the returned value. This is often done if the returned value is a struct literal:

    fn take_from(r: &mut Reader<'_>) -> Result<Self>  
        // several lines of code
        Ok(AuthChallenge   challenge, methods  )

Introduce Ok(()) at the end For functions returning Result<()>, you can write Ok(()). This is usual, but not ubiquitous, since sometimes you can omit it. Wrap the whole body If you don t have the return value in a variable, you can wrap the whole body of the function in Ok( ). Whether this is a good idea depends on how big and complex the body is.

    fn from_str(s: &str) -> std::result::Result<Self, Self::Err>  
        Ok(match s  
            "Authority" => RelayFlags::AUTHORITY,
            // many other branches
            _ => RelayFlags::empty(),
         )

Omit the wrap when calling fallible sub-functions If your function wraps another function call of the same return and error type, you don t need to write the Ok at all. Instead, you can simply call the function and not apply ?. You can do this even if your function selects between a number of different sub-functions to call:

    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result  
        if flags::unsafe_logging_enabled()  
            std::fmt::Display::fmt(&self.0, f)
          else  
            self.0.display_redacted(f)

But this doesn t work if the returned error type isn t the same, but needs the autoconversion implied by the ? operator. Convert a fallible sub-function error with Ok( ... ?) If the final thing a function does is chain to another fallible function, but with a different error type, the error must be converted somehow. This can be done with ?.

     fn try_from(v: i32) -> Result<Self, Error>  
         Ok(Percentage::new(v.try_into()?))

Convert a fallible sub-function error with .map_err Or, rarely, people solve the same problem by converting explicitly with .map_err:

     pub fn create_unbootstrapped(self) -> Result<TorClient<R>>  
         // several lines of code
         TorClient::create_inner(
             // several parameters
         )
         .map_err(ErrorDetail::into)

What is to be done, then? The fehler library is in excellent taste and has the answer. With fehler:

Whether a function is fallible, and what it s error type is, is specified in one place. It is not entangled with the main return value type, nor with the success return paths.
So the success paths out of a function are not specially marked with error handling boilerplate. The end of function return value, and the expression after return, are automatically wrapped up in Ok. So the body of a fallible function is just like the body of an infallible one, except for places where error handling is actually involved.
Error returns occur through ? error chaining, and with a new explicit syntax for error return.
We usually talk about the error we are possibly returning, and avoid talking about Result unless we need to.

fehler provides:

An attribute macro #[throws(ErrorType)] to make a function fallible in this way.
A macro throws!(error) for explicitly failing.

This is precisely correct. It is very ergonomic. Consequences include:

One does not need to decide where to put the Ok-wrapping, since it s automatic rather than explicitly written out.
Specifically, what idiom to adopt in the body (for example write!(...)?; vs write!(...) in a formatter) does not depend on whether the error needs converting, how complex the body is, and whether the final expression in the function is itself fallible.
Making an infallible function fallible involves only adding #[throws] to its definition, and ? to its call sites. One does not need to edit the body, or the return type.
Changing the error returned by a function to a suitably compatible different error type does not involve changing the function body.
There is no need for a local Result alias shadowing std::result::Result, which means that when one needs to speak of Result explciitly, the code is clearer.

Limitations of fehler But, fehler is a Rust procedural macro, so it cannot get everything right. Sadly there are some wrinkles.

You can t write #[throws] on a closure.
Sometimes you can get quite poor error messages if you have a sufficiently broken function body.
Code inside a macro call isn t properly visible to fehler so sometimes return statements inside macro calls are untreated. This will lead to a type error, so isn t a correctness hazard, but it can be nuisance if you like other syntax extensions eg if_chain.
#[must_use] #[throws(Error)] fn obtain() -> Thing; ought to mean that Thing must be used, not the Result<Thing, Error>.

But, Rust-with-#[throws] is so much nicer a language than Rust-with-mandatory-Ok-wrapping, that these are minor inconveniences. Please can we have #[throws] in the Rust language This ought to be part of the language, not a macro library. In the compiler, it would be possible to get the all the corner cases right. It would make the feature available to everyone, and it would quickly become idiomatic Rust throughout the community. It is evident from reading writings from the time, particularly those from withoutboats, that there were significant objections to automatic Ok-wrapping. It seems to have become quite political, and some folks burned out on the topic. Perhaps, now, a couple of years later, we can revisit this area and solve this problem in the language itself ? Explicitness An argument I have seen made against automatic Ok-wrapping, and, in general, against any kind of useful language affordance, is that it makes things less explicit. But this argument is fundamentally wrong for Ok-wrapping. Explicitness is not an unalloyed good. We humans have only limited attention. We need to focus that attention where it is actually needed. So explicitness is good in situtions where what is going on is unusual; or would otherwise be hard to read; or is tricky or error-prone. Generally: explicitness is good for things where we need to direct humans attention. But Ok-wrapping is ubiquitous in fallible Rust code. The compiler mechanisms and type systems almost completely defend against mistakes. All but the most novice programmer knows what s going on, and the very novice programmer doesn t need to. Rust s error handling arrangments are designed specifically so that we can avoid worrying about fallibility unless necessary except for the Ok-wrapping. Explicitness about Ok-wrapping directs our attention away from whatever other things the code is doing: it is a distraction. So, explicitness about Ok-wrapping is a bad thing. Appendix - examples showning code with Ok wrapping is worse than code using #[throws] Observe these diffs, from my abandoned attempt to remove the fehler dependency from Hippotat. I have a type alias AE for the usual error type (AE stands for anyhow::Error). In the non-#[throws] code, I end up with a type alias AR<T> for Result<T, AE>, which I think is more opaque but at least that avoids typing out -> Result< , AE> a thousand times. Some people like to have a local Result alias, but that means that the standard Result has to be referred to as StdResult or std::result::Result.

With `fehler` and `#[throws]`	Vanilla Rust, `Result<>`, mandatory `Ok-wrapping`

Return value clearer, error return less wordy:
`impl Parseable for Secret`	`impl Parseable for Secret`
`#[throws(AE)]`
`fn parse(s: Option<&str>) -> Self`	`fn parse(s: Option<&str>) -> AR<Self>`
`let s = s.value()?;`	`let s = s.value()?;`
`if s.is_empty() throw!(anyhow!( secret value cannot be empty ))`	`if s.is_empty() return Err(anyhow!( secret value cannot be empty ))`
`Secret(s.into())`	`Ok(Secret(s.into()))`


No need to wrap whole match statement in `Ok( ):`
`#[throws(AE)]`
`pub fn client<T>(&self, key: & static str, skl: SKL) -> T`	`pub fn client<T>(&self, key: & static str, skl: SKL) -> AR<T>`
`where T: Parseable + Default`	`where T: Parseable + Default`
`match self.end`	`Ok(match self.end`
`LinkEnd::Client => self.ordinary(key, skl)?,`	`LinkEnd::Client => self.ordinary(key, skl)?,`
`LinkEnd::Server => default(),`	`LinkEnd::Server => default(),`
	`)`

Return value and `Ok(())` entirely replaced by `#[throws]`:
`impl Display for Loc`	`impl Display for Loc`
`#[throws(fmt::Error)]`
`fn fmt(&self, f: &mut fmt::Formatter)`	`fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result`
`write!(f, :? : , &self.file, self.lno)?;`	`write!(f, :? : , &self.file, self.lno)?;`
`if let Some(s) = &self.section`	`if let Some(s) = &self.section`
`write!(f, )?;`	`write!(f, )?;`


	`Ok(())`

Call to `write!` now looks the same as in more complex case shown above:
`impl Debug for Secret`	`impl Debug for Secret`
`#[throws(fmt::Error)]`
`fn fmt(&self, f: &mut fmt::Formatter)`	`fn fmt(&self, f: &mut fmt::Formatter)-> fmt::Result`
`write!(f, "Secret(***)")?;`	`write!(f, "Secret(***)")`


Much tiresome `return Ok()` noise removed:
`impl FromStr for SectionName`	`impl FromStr for SectionName`
`type Err = AE;`	`type Err = AE;`
`#[throws(AE)]`
`fn from_str(s: &str) -> Self`	`fn from_str(s: &str) ->AR< Self>`
`match s`	`match s`
`COMMON => return SN::Common,`	`COMMON => return Ok(SN::Common),`
`LIMIT => return SN::GlobalLimit,`	`LIMIT => return Ok(SN::GlobalLimit),`
`_ =>`	`_ =>`
`;`	`;`
`if let Ok(n@ ServerName(_)) = s.parse() return SN::Server(n)`	`if let Ok(n@ ServerName(_)) = s.parse() return Ok(SN::Server(n))`
`if let Ok(n@ ClientName(_)) = s.parse() return SN::Client(n)`	`if let Ok(n@ ClientName(_)) = s.parse() return Ok(SN::Client(n))`

`if client == LIMIT return SN::ServerLimit(server)`	`if client == LIMIT return Ok(SN::ServerLimit(server))`
`let client = client.parse().context( client name in link section name )?;`	`let client = client.parse().context( client name in link section name )?;`
`SN::Link(LinkName server, client )`	`Ok(SN::Link(LinkName server, client ))`

edited 2022-12-18 19:58 UTC to improve, and 2022-12-18 23:28 to fix, formatting

comments

Russell Coker: Wall Facers

I m currently reading the second book of the TriSolar Sci-Fi series by Cixin Liu, I ve only just started it so this post can t have spoilers for it and I will also only have minimal spoilers for the first book (nothing more than you will get from pop culture references to it). In the second book there are people called Wall Facers who have broad powers to shape the course of the Human response to an alien invasion in 400+ years time. The idea is that as the aliens have an ability to see everything that can be seen on Earth any ideas that leave the brain of one person can be snooped on, so if some people act independently without communicating their plans they can take the aliens by surprise. While that is probably going to work out well in the books history in general seems to show that people who act independently without any useful feedback from others tend to perform poorly, every king and dictator seems to demonstrate this. Efficient Work I ve been thinking about what I would do if I had significant powers to guide the response to an alien threat in some hundreds of years. The first thing to do would be to get all people working as efficiently as possible. Without the imminent threat of alien invasion we can have debates about how much time to spend working vs leisure time. Should we make 24 hours per week the new normal work week? But if the threat of annihilation is looming then the discussion should be about how to get as many people as possible working as much as possible. Currently 1/4 of the world population lack access to safe drinking water [1], there s a plan to achieve universal and equitable access to safe and affordable drinking water for all by 2030 . But 2030 isn t soon enough, another 8 years where 1/4 of children born won t reach their potential due to poor water is unacceptable. Currently 13% of the world population don t have access to electricity and 40% don t have access to clean fuels for cooking [2]. Lack of energy access reduces health and opportunities for education. Healthcare is another major obstacle to human development and therefore economic development. Even some allegedly first-world countries like the US lack universal affordable healthcare. I think we could reasonably get safe water to 99% of the world population before 2025 if we tried hard (IE applied a small fraction of the resources of a single war to it). Getting electricity to 95% of the world population and clean cooking fuels to 90% of the world population are probably achievable goals for 2025 as well. Healthcare is a slightly harder problem as we need to train more nurses and doctors. A registered nurse apparently needs 3 years of training after completing high school. We may have to improve high schools to get more students up to the standard of nursing degrees. If it takes 3 years to improve schools in year 9+ and then 3 years to get more high school graduates that would mean that it would take about 9 years to get an increase in nurses. Doing this would require increasing the capacity of universities and making university almost free (as it was for decades). So in about 2031 we could start sending a significant number of nurses from developed countries to help out developing countries. Becoming a doctor apparently requires 8 years of study plus a minimum of 3 years residency . So if doctors were entirely trained in first world countries then we wouldn t be able to send many doctors to developing countries until 2039. If the residency was performed in other countries then it could be as early as 2036. According to the WHO currently only half the world s population have adequate healthcare [3]. To get adequate healthcare to the world we need to more than double the number of doctors because currently we don t have enough in countries with decent healthcare systems such as Australia. It would probably take to at least 2060 to get enough doctors trained. The end goal of course would be to have every country able to train enough of it s citizens to provide all medical services, but countries that have serious widespread healthcare problems that reduce the number of people who can pursue higher education will have difficulty in that until some of the healthcare problems are alleviated. Education Obviously education is important to all achievements. Currently education seems very poorly run, it is possible to create a school system that teaches children effectively without the bullying that is common in Australia and without the sort of pressure that South Korea is infamous for. One of the main issues to resolve with the school system is the idea that everyone should learn at the same speed, that goal can only be achieved by making the majority of the students learn slowly. Students should be able to freely skip ahead as their skill permits and finish school at any age. Also high school isn t for everyone, the tech schools that teach trades need to be brought back. Deceiving Aliens A plot point in the TriSolar series is that the aliens can see each other s thoughts, the local communication (their equivalent to talking) is based on reading each other s thoughts without the possibility of deception. While deceptive written communication is potentially possible for them they haven t developed skills in that area. As a first step towards exploiting this humans could focus more on linguistic development that increases language complexity, such as the way the English language adopts words from other languages and gives them slightly different meanings for example the difference between driver and chauffeur and the difference between dog and hound is not obvious to many Europeans who otherwise speak English fluently. When involved in conversation it s possible to convey meaning without directly stating things, this is used extensively by people who are interested in security. My observations of this are based on conversations with people who do government work, but I imagine that criminal organisations also do similar things for similar reasons. An increased focus on poetry in schools might be helpful in developing skills for conveying ideas to people who think in human ways where the message is unclear to non-humans who have no experience of deception. I wonder whether the ability to understand human poetry would make aliens less hostile to humans, if they can think like us then they would be less likely to want to exterminate us. Poker is a game that depends on the ability to deceive others, I ve never been any good at it. I wonder if making it part of the school curriculum would help improve the overall human ability to deceive aliens. I don t think that such schools would become dens of sociopathy as depicted in Kakegurui, but it might have some negative results. Spreading education to a larger portion of the world s population requires more use of electronic education. Anything learned via text can be more easily assimilated by aliens than things that are learned directly from other people. For high school and the basics of a university degree this is fine. But for more advanced education it seems that having a large face to face component might help keep the value away from the aliens. More Ideas? What do you think I missed on this list? I wasn t trying to list every possibility, just the more important ones. Also for any goals other than increasing inequality for it s own sake we should improve health and education for the world.

8 December 2022

Reproducible Builds: Reproducible Builds in November 2022

Welcome to yet another report from the Reproducible Builds project, this time for November 2022. In all of these reports (which we have been publishing regularly since May 2015) we attempt to outline the most important things that we have been up to over the past month. As always, if you interested in contributing to the project, please visit our Contribute page on our website.

Reproducible Builds Summit 2022 Following-up from last month s report about our recent summit in Venice, Italy, a comprehensive report from the meeting has not been finalised yet watch this space! As a very small preview, however, we can link to several issues that were filed about the website during the summit (#38, #39, #40, #41, #42, #43, etc.) and collectively learned about Software Bill of Materials (SBOM) s and how `.buildinfo` files can be seen/used as SBOMs. And, no less importantly, the Reproducible Builds t-shirt design has been updated

Reproducible Builds at European Cyber Week 2022 During the European Cyber Week 2022, a Capture The Flag (CTF) cybersecurity challenge was created by Fr d ric Pierret on the subject of Reproducible Builds. The challenge consisted in a pedagogical sense based on how to make a software release reproducible. To progress through the challenge issues that affect the reproducibility of build (such as build path, timestamps, file ordering, etc.) were to be fixed in steps in order to get the final flag in order to win the challenge. At the end of the competition, five people succeeded in solving the challenge, all of whom were awarded with a shirt. Fr d ric Pierret intends to create similar challenge in the form of a how to in the Reproducible Builds documentation, but two of the 2022 winners are shown here:

On business adoption and use of reproducible builds Simon Butler announced on the rb-general mailing list that the Software Quality Journal published an article called On business adoption and use of reproducible builds for open and closed source software. This article is an interview-based study which focuses on the adoption and uses of Reproducible Builds in industry, with a focus on investigating the reasons why organisations might not have adopted them:
[ ] industry application of R-Bs appears limited, and we seek to understand whether awareness is low or if significant technical and business reasons prevent wider adoption.
This is achieved through interviews with software practitioners and business managers, and touches on both the business and technical reasons supporting the adoption (or not) of Reproducible Builds. The article also begins with an excellent explanation and literature review, and even introduces a new helpful analogy for reproducible builds:
[Users are] able to perform a bitwise comparison of the two binaries to verify that they are identical and that the distributed binary is indeed built from the source code in the way the provider claims. Applied in this manner, R-Bs function as a canary, a mechanism that indicates when something might be wrong, and offer an improvement in security over running unverified binaries on computer systems.
The full paper is available to download on an open access basis. Elsewhere in academia, Beatriz Michelson Reichert and Rafael R. Obelheiro have published a paper proposing a systematic threat model for a generic software development pipeline identifying possible mitigations for each threat (PDF). Under the Tampering rubric of their paper, various attacks against Continuous Integration (CI) processes:
An attacker may insert a backdoor into a CI or build tool and thus introduce vulnerabilities into the software (resulting in an improper build). To avoid this threat, it is the developer s responsibility to take due care when making use of third-party build tools. Tampered compilers can be mitigated using diversity, as in the diverse double compiling (DDC) technique. Reproducible builds, a recent research topic, can also provide mitigation for this problem. (PDF)

Misc news

A change was proposed for the Go programming language to enable reproducible builds when Link Time Optimisation (LTO) is enabled. As mentioned in the changelog, Morten Linderud s patch fixes two issues when the linker used in conjunction with the `-flto` option: the first involves solving an issue related to seeded random numbers; and the second involved the binary embedding the current working directory in compressed sections of the LTO object. Both of these issues made the build unreproducible.

In the .NET framework ecosystem, a wiki page for the Roslyn .NET C# and Visual Basic compiler was uncovered this month that details its attempts to ensure end-to-end reproducible builds by focusing on the definition on what are considered inputs to the compiler for the purpose of determinism . This is a spiritual followup to a 2016 blog post by Microsoft developer Jared Parsons on Deterministic builds in Roslyn which starts: It seems silly to celebrate features which should have been there from the start.

Ian Lance Taylor followed up an old post to report that Jakub Jelinek s patch from September 2000 is incomplete.

In F-Droid this month, Reproducible Builds contributor FC Stegerman created a set of reproducible APK tools as a workaround for issues like the order of files in APKs built on macOS being non-deterministic. In addition, the new issue documenting the overview of apps using reproducible builds shows that F-Droid added 11 new apps that use reproducible builds, and FC Stegerman released apksigcopier version 1.1.0 which adds support for APKs signed by Signflinger .

martinSusz has written up a fascinating wiki page describing how to generate quasi-reproducible firmware ROMs for System-on-a-Chip (SoC) components fabricated by Rock Chip. These chips are used in popular low-cost laptops such as the Pine64 PinebookPro and Asus C201. The link is worth viewing simply for the interesting diagram.

Our monthly IRC meeting was held on November 29th 2022. Our next meeting will be on January 31st 2023; we ll skip the meeting in December due to the proximity to Christmas, etc.

On our mailing list this month:

Adrian Diglio from Microsoft asked How to Add a New Project within Reproducible Builds which solicited a number of replies.

Vagrant Cascadian posed an interesting question regarding the difference between test builds vs rebuilds (or verification rebuilds ). As Vagrant poses in their message, they re both useful for slightly different purposes, and it might be good to clarify the distinction [ ].

Debian & other Linux distributions Over 50 reviews of Debian packages were added this month, another 48 were updated and almost 30 were removed, all of which adds to our knowledge about identified issues. Two new issue types were added as well. [ ][ ]. Vagrant Cascadian announced on our mailing list another online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). The first such sprint took place on September 22nd, but others were held on October 6th and October 20th. There were two additional sprints that occurred in November, however, which resulted in the following progress:

Chris Lamb:

`paxctl` (Fixed #1020804)

`png23d` (Fixed #1020805)

`tuxcmd-modules` (Fixed #1011500 & #941296)

`waili` (Fixed #1020751)

`zephyr` (Fixed #828867 #1021374)

Vagrant Cascadian:

`ddd` (Fixed #834016)

`libpam-ldap` (Fixed #834050)

`nsnake` (Fixed #833612)

`quvi` (Fixed #835259)

`stressapptest` (Fixed #831587 & #986653)

`tcpreen` (Fixed #831585)

`boolector` (Fixed #1023886)

`tsdecrypt` (Fixed #829713 & #1022130)

`wbxml2` (QA upload fixed build path issues)

`tercpp` (QA upload fixed build path issues)

Lastly, Roland Clobus posted his latest update of the status of reproducible Debian ISO images on our mailing list. This reports that all major desktops build reproducibly with bullseye, bookworm and sid as well as that no custom patches needed to applied to Debian unstable for this result to occur. During November, however, Roland proposed some modifications to live-setup and the rebuild script has been adjusted to fix the failing Jenkins tests for Debian bullseye [ ][ ].
In other news, Miro Hron ok proposed a change to clamp build modification times to the value of `SOURCE_DATE_EPOCH`. This was initially suggested and discussed on a `devel@` mailing list post but was later written up on the Fedora Wiki as well as being officially proposed to Fedora Engineering Steering Committee (FESCo).

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`dwz` (Profile-guided optimisation issue)

`icmake` (filesystem ordering issue)

`llmnrd`

`elixir` (report a bug re. stuck build on single-core VMs)

`warzone2100` (report a bug re. parallelism-dependent output)

Chris Lamb:

#1023589 filed against `libnvme`.

#1024352 filed against `pykafka`.

Vagrant Cascadian:

#1023886 filed against `boolector`.

#1023956 filed against `fl-cow`.

#1023957 filed against `gerstensaft`.

#1023960 filed against `libcgicc`.

#1024007 filed against `haskell98-report`.

#1024125 filed against `ucspi-proxy`.

#1024126 filed against `hunt`.

#1024279 filed against `tolua++`.

#1024282 filed against `twoftpd`.

#1024283 filed against `ipsvd`.

#1024284 filed against `gentoo`.

#1024286 filed against `lcm`.

#1024288 filed against `apcupsd`.

#1024289 filed against `openfortivpn`.

#1024290 filed against `xtb`.

#1024291 filed against `gnunet`.

#1024292 filed against `swift-im`.

#1024396 filed against `brewtarget`.

#1024399 filed against `xrprof`.

#1024404 filed against `gitlint`.

#1024412 filed against `claws-mail`.

#1024413 filed against `presage`.

#1024530 filed against `jh7100-bootloader-recovery`.

Victor Westerhuis:

#1024482 & #1024638 filed against `opencv`.

John Neffenger:

`tomcat` (Fixed Apache bug #66346)

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions `226` and `227` to Debian:

Support both `python3-progressbar` and `python3-progressbar2`, two modules providing the `progressbar` Python module. [ ]

Don t run Python decompiling tests on Python bytecode that `file(1)` cannot detect yet and Python 3.11 cannot unmarshal. (#1024335)

Don t attempt to attach text-only differences notice if there are no differences to begin with. (#1024171)

Make sure we recommend `apksigcopier`. [ ]

Tidy generation of `os_list`. [ ]

Make the code clearer around generating the Debian substvars . [ ]

Use our `assert_diff` helper in `test_lzip.py`. [ ]

Drop other copyright notices from `lzip.py` and `test_lzip.py`. [ ]

In addition to this, Christopher Baines added lzip support [ ], and FC Stegerman added an optimisation whereby we don t run `apktool` if no differences are detected before the signing block [ ].
A significant number of changes were made to the Reproducible Builds website and documentation this month, including Chris Lamb ensuring the openEuler logo is correctly visible with a white background [ ], FC Stegerman de-duplicated by email address to avoid listing some contributors twice [ ], Herv Boutemy added Apache Maven to the list of affiliated projects [ ] and boyska updated our Contribute page to remark that the Reproducible Builds presence on salsa.debian.org is not just the Git repository but is also for creating issues [ ][ ]. In addition to all this, however, Holger Levsen made the following changes:

Add a number of existing publications [ ][ ] and update metadata for some existing publications as well [ ].

Hide draft posts on the website homepage. [ ]

Add the Warpforge build tool as a participating project of the summit. [ ]

Clarify in the footer that we welcome patches to the website repository. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:

Improve the generation of meta package sets (used in grouping packages for reporting/statistical purposes) to treat Debian bookworm as equivalent to Debian unstable in this specific case [ ] and to parse the list of packages used in the Debian cloud images [ ][ ][ ].

Temporarily allow Frederic to `ssh(1)` into our snapshot server as the `jenkins` user. [ ]

Keep some reproducible jobs Jenkins logs much longer [ ] (later reverted).

Improve the node health checks to detect failures to update the Debian cloud image package set [ ][ ] and to improve prioritisation of some kernel warnings [ ].

Always echo any IRC output to Jenkins output as well. [ ]

Deal gracefully with problems related to processing the cloud image package set. [ ]

Finally, Roland Clobus continued his work on testing Live Debian images, including adding support for specifying the origin of the Debian installer [ ] and to warn when the image has unmet dependencies in the package list (e.g. due to a transition) [ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mailing list: `rb-general@lists.reproducible-builds.org`

2 December 2022

Reproducible Builds (diffoscope): diffoscope 228 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 228. This version includes the following changes:

[ FC Stegerman ]
* As an optimisation, don't run apktool if no differences are detected before
  the signing block. (Closes: reproducible-builds/diffoscope!105)
[ Chris Lamb ]
* Support both the python3-progressbar and python3-progressbar2 Debian
  packages, two modules providing the "progressbar" Python module.
  (Closes: reproducible-builds/diffoscope#323)
* Ensure we recommend apksigcopier. (Re: reproducible-builds/diffoscope!105)
* Make the code clearer around generating the Debian substvars and tidy
  generation of os_list.
* Update copyright years.

You find out more by visiting the project homepage.

30 November 2022

Matthew Garrett: Making unphishable 2FA phishable

One of the huge benefits of WebAuthn is that it makes traditional phishing attacks impossible. An attacker sends you a link to a site that looks legitimate but isn't, and you type in your credentials. With SMS or TOTP-based 2FA, you type in your second factor as well, and the attacker now has both your credentials and a legitimate (if time-limited) second factor token to log in with. WebAuthn prevents this by verifying that the site it's sending the secret to is the one that issued it in the first place - visit an attacker-controlled site and said attacker may get your username and password, but they won't be able to obtain a valid WebAuthn response.

But what if there was a mechanism for an attacker to direct a user to a legitimate login page, resulting in a happy WebAuthn flow, and obtain valid credentials for that user anyway? This seems like the lead-in to someone saying "The Aristocrats", but unfortunately it's (a) real, (b) RFC-defined, and (c) implemented in a whole bunch of places that handle sensitive credentials. The villain of this piece is RFC 8628, and while it exists for good reasons it can be used in a whole bunch of ways that have unfortunate security consequences.

What is the RFC 8628-defined Device Authorization Grant, and why does it exist? Imagine a device that you don't want to type a password into - either it has no input devices at all (eg, some IoT thing) or it's awkward to type a complicated password (eg, a TV with an on-screen keyboard). You want that device to be able to access resources on behalf of a user, so you want to ensure that that user authenticates the device. RFC 8628 describes an approach where the device requests the credentials, and then presents a code to the user (either on screen or over Bluetooth or something), and starts polling an endpoint for a result. The user visits a URL and types in that code (or is given a URL that has the code pre-populated) and is then guided through a standard auth process. The key distinction is that if the user authenticates correctly, the issued credentials are passed back to the device rather than the user - on successful auth, the endpoint the device is polling will return an oauth token.

But what happens if it's not a device that requests the credentials, but an attacker? What if said attacker obfuscates the URL in some way and tricks a user into clicking it? The user will be presented with their legitimate ID provider login screen, and if they're using a WebAuthn token for second factor it'll work correctly (because it's genuinely talking to the real ID provider!). The user will then typically be prompted to approve the request, but in every example I've seen the language used here is very generic and doesn't describe what's going on or ask the user. AWS simply says "An application or device requested authorization using your AWS sign-in" and has a big "Allow" button, giving the user no indication at all that hitting "Allow" may give a third party their credentials.

This isn't novel! Christoph Tafani-Dereeper has an excellent writeup on this topic from last year, which builds on Nestori Syynimaa's earlier work. But whenever I've talked about this, people seem surprised at the consequences. WebAuthn is supposed to protect against phishing attacks, but this approach subverts that protection by presenting the user with a legitimate login page and then handing their credentials to someone else.

RFC 8628 actually recognises this vector and presents a set of mitigations. Unfortunately nobody actually seems to implement these, and most of the mitigations are based around the idea that this flow will only be used for physical devices. Sadly, AWS uses this for initial authentication for the aws-cli tool, so there's no device in that scenario. Another mitigation is that there's a relatively short window where the code is valid, and so sending a link via email is likely to result in it expiring before the user clicks it. An attacker could avoid this by directing the user to a domain under their control that triggers the flow and then redirects the user to the login page, ensuring that the code is only generated after the user has clicked the link.

Can this be avoided? The best way to do so is to ensure that you don't support this token issuance flow anywhere, or if you do then ensure that any tokens issued that way are extremely narrowly scoped. Unfortunately if you're an AWS user, that's probably not viable - this flow is required for the cli tool to perform SSO login, and users are going to end up with broadly scoped tokens as a result. The logs are also not terribly useful.

The infuriating thing is that this isn't necessary for CLI tooling. The reason this approach is taken is that you need a way to get the token to a local process even if the user is doing authentication in a browser. This can be avoided by having the process listen on localhost, and then have the login flow redirect to localhost (including the token) on successful completion. In this scenario the attacker can't get access to the token without having access to the user's machine, and if they have that they probably have access to the token anyway.

There's no real moral here other than "Security is hard". Sorry.

comments

Russell Coker: Links November 2022

Here s the US Senate Statement of Frances Haugen who used to work for Facebook countering misinformation and espionage [1]. She believes that Facebook is capable of dealing with the online radicalisation and promotion of bad things on it s platform but is unwilling to do so for financial reasons. We need strong regulation of Facebook and it probably needs to be broken up. Interesting article from The Atlantic about filtered cigarettes being more unhealthy than unfiltered [2]. Every time I think I know how evil tobacco companies are I get surprised by some new evidence. Cory Doctorow wrote an insightful article about resistance to rubber hose cryptanalysis [3]. Cory Doctorow wrote an interesting article When Automation Becomes Enforcement with a new way of thinking about Snapchat etc [4]. Cory Doctorow wrote an insightful and informative article Big Tech Isn t Stealing News Publishers Content, It s Stealing Their Money [5] which should be read by politicians from all countries that are trying to restrict quoting news on the Internet. Interesting articl;e on Santiago Genoves who could be considered as a pioneer of reality TV for deliberately creating disputes between a group of young men and women on a raft in the Atlantic for 3 months [6]. Matthew Garrett wrote an interesting review of the Freedom Phone, seems that it s not good for privacy and linked to some companies doing weird stuff [7]. Definitely worth reading. Cory Doctorow wrote an interesting and amusing article about backdoors for machine learning [8] Petter Reinholdtsen wrote an informative post on how to make a bootable USB stick image from an ISO file [9]. Apparently Lenovo provides ISO images to update laptops that don t have DVD drives. :( Barry Gander wrote an interesting article about the fall of Rome and the decline of the US [10]. It s a great concern that the US might fail in the same way as Rome. Ethan Siegel wrote an interesting article about Iapetus, a moon of Saturn that is one of the strangest objects in the solar system [11]. Cory Doctorow s article Revenge of the Chickenized Reverse-Centaurs has some good insights into the horrible things that companies like Amazon are doing to their employees and how we can correct that [12]. Charles Stross wrote an insightful blog post about Billionaires [13]. They can t do much for themselves with the extra money beyond about $10m or $100m (EG Steve Jobs was unable to extend his own life much when he had cancer) and their money is trivial when compared to the global economy. They are however effective parasites capable of performing great damage to the country that hosts them. Cory Doctorow has an interesting article about how John Deere is being evil again [14]. This time with potentially catastrophic results.

[1] https://tinyurl.com/yeppy78q
[2] https://tinyurl.com/28yt5wbg
[3] https://onezero.medium.com/rubber-hoses-fd685385dcd4
[4] https://tinyurl.com/2ztqaw9a
[5] https://tinyurl.com/2kv68veo
[6] https://en.culturess.com/view/?id=santiago-genoves-experiment-cul
[7] https://mjg59.dreamwidth.org/59479.html
[8] https://tinyurl.com/yy8pyg65
[9] https://tinyurl.com/2qr9evgb
[10] https://tinyurl.com/2ngytzuh
[11] https://tinyurl.com/2qjffrt3
[12] https://tinyurl.com/2ote6mw7
[13] https://tinyurl.com/yxe75yel
[14] https://tinyurl.com/y23lbw2n

16 November 2022

Antoine Beaupr : A ZFS migration

In my tubman setup, I started using ZFS on an old server I had lying around. The machine is really old though (2011!) and it "feels" pretty slow. I want to see how much of that is ZFS and how much is the machine. Synthetic benchmarks show that ZFS may be slower than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm that on a live workload: my workstation. Plus, I want easy, regular, high performance backups (with send/receive snapshots) and there's no way I'm going to use BTRFS because I find it too confusing and unreliable. So off we go.

Installation Since this is a conversion (and not a new install), our procedure is slightly different than the official documentation but otherwise it's pretty much in the same spirit: we're going to use ZFS for everything, including the root filesystem. So, install the required packages, on the current system:

apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux

We also tell DKMS that we need to rebuild the initrd when upgrading:

echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Partitioning This is going to partition /dev/sdc with:

1MB MBR / BIOS legacy boot
512MB EFI boot
1GB bpool, unencrypted pool for /boot

rest of the disk for zpool, the rest of the data

 sgdisk --zap-all /dev/sdc
 sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
 sgdisk     -n2:1M:+512M   -t2:EF00 /dev/sdc
 sgdisk     -n3:0:+1G      -t3:BF01 /dev/sdc
 sgdisk     -n4:0:0        -t4:BF00 /dev/sdc

That will look something like this:

    root@curie:/home/anarcat# sgdisk -p /dev/sdc
    Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
    Model: ESD-S1C         
    Sector size (logical/physical): 512/512 bytes
    Disk identifier (GUID): [REDACTED]
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 1953525134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)
    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      1953525134   930.0 GiB   BF00

Unfortunately, we can't be sure of the sector size here, because the USB controller is probably lying to us about it. Normally, this smartctl command should tell us the sector size as well:

root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black Mobile
Device Model:     WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Above is the example of the builtin HDD drive. But the SSD device enclosed in that USB controller doesn't support SMART commands, so we can't trust that it really has 512 bytes sectors. This matters because we need to tweak the ashift value correctly. We're going to go ahead the SSD drive has the common 4KB settings, which means ashift=12. Note here that we are not creating a separate partition for swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. But since this is "just" a workstation, we're betting that we will not suffer from this problem, after hearing a report from another Debian developer running this setup on their workstation successfully. We do not recommend this setup though. In fact, if I were to redo this partition scheme, I would probably use LUKS encryption and setup a dedicated swap partition, as I had problems with ZFS encryption as well.

Creating pools ZFS pools are somewhat like "volume groups" if you are familiar with LVM, except they obviously also do things like RAID-10. (Even though LVM can technically also do RAID, people typically use mdadm instead.) In any case, the guide suggests creating two different pools here: one, in cleartext, for boot, and a separate, encrypted one, for the rest. Technically, the boot partition is required because the Grub bootloader only supports readonly ZFS pools, from what I understand. But I'm a little out of my depth here and just following the guide.

Boot pool creation This creates the boot pool in readonly mode with features that grub supports:

    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off \
        -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool /dev/sdc3

I haven't investigated all those settings and just trust the upstream guide on the above.

Main pool creation This is a more typical pool creation.

    zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool /dev/sdc4

Breaking this down:

-o ashift=12: mentioned above, 4k sector size
-O encryption=on -O keylocation=prompt -O keyformat=passphrase: encryption, prompt for a password, default algorithm is aes-256-gcm, explicit in the guide, made implicit here
-O acltype=posixacl -O xattr=sa: enable ACLs, with better performance (not enabled by default)
-O dnodesize=auto: related to extended attributes, less compatibility with other implementations
-O compression=zstd: enable zstd compression, can be disabled/enabled by dataset to with zfs set compression=off rpool/example
-O relatime=on: classic atime optimisation, another that could be used on a busy server is atime=off
-O canmount=off: do not make the pool mount automatically with mount -a?
-O mountpoint=/ -R /mnt: mount pool on / in the future, but /mnt for now

Those settings are all available in zfsprops(8). Other flags are defined in zpool-create(8). The reasoning behind them is also explained in the upstream guide and some also in [the Debian wiki][]. Those flags were actually not used:

-O normalization=formD: normalize file names on comparisons (not storage), implies utf8only=on, which is a bad idea (and effectively meant my first sync failed to copy some files, including this folder from a supysonic checkout). and this cannot be changed after the filesystem is created. bad, bad, bad.

[the Debian wiki]: https://wiki.debian.org/ZFS#Advanced_Topics

Side note about single-disk pools Also note that we're living dangerously here: single-disk ZFS pools are rumoured to be more dangerous than not running ZFS at all. The choice quote from this article is:
[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.
Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.

Creating mount points Next we create the actual filesystems, known as "datasets" which are the things that get mounted on mountpoint and hold the actual files.

this creates two containers, for ROOT and BOOT
```
 zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
 zfs create -o canmount=off -o mountpoint=none bpool/BOOT
```
Note that it's unclear to me why those datasets are necessary, but they seem common practice, also used in this FreeBSD example. The OpenZFS guide mentions the Solaris upgrades and Ubuntu's zsys that use that container for upgrades and rollbacks. This blog post seems to explain a bit the layout behind the installer.
this creates the actual boot and root filesystems:
```
 zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
 zfs mount rpool/ROOT/debian &&
 zfs create -o mountpoint=/boot bpool/BOOT/debian
```
I guess the debian name here is because we could technically have multiple operating systems with the same underlying datasets.

then the main datasets:

 zfs create                                 rpool/home &&
 zfs create -o mountpoint=/root             rpool/home/root &&
 chmod 700 /mnt/root &&
 zfs create                                 rpool/var

exclude temporary files from snapshots:

 zfs create -o com.sun:auto-snapshot=false  rpool/var/cache &&
 zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp &&
 chmod 1777 /mnt/var/tmp

and skip automatic snapshots in Docker:
```
 zfs create -o canmount=off                 rpool/var/lib &&
 zfs create -o com.sun:auto-snapshot=false  rpool/var/lib/docker
```
Notice here a peculiarity: we must create rpool/var/lib to create rpool/var/lib/docker otherwise we get this error:
```
 cannot create 'rpool/var/lib/docker': parent does not exist
```
... and no, just creating /mnt/var/lib doesn't fix that problem. In fact, it makes things even more confusing because an existing directory shadows a mountpoint, which is the opposite of how things normally work. Also note that you will probably need to change storage driver in Docker, see the zfs-driver documentation for details but, basically, I did:
```
echo '  "storage-driver": "zfs"  ' > /etc/docker/daemon.json
```
Note that podman has the same problem (and similar solution):
```
printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
```

make a tmpfs for /run:

 mkdir /mnt/run &&
 mount -t tmpfs tmpfs /mnt/run &&
 mkdir /mnt/run/lock

We don't create a /srv, as that's the HDD stuff. Also mount the EFI partition:

mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/

At this point, everything should be mounted in /mnt. It should look like this:

root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/debian     899G  384K  899G   1% /mnt
bpool/BOOT/debian     832M  123M  709M  15% /mnt/boot
rpool/home            899G  256K  899G   1% /mnt/home
rpool/home/root       899G  256K  899G   1% /mnt/root
rpool/var             899G  384K  899G   1% /mnt/var
rpool/var/cache       899G  256K  899G   1% /mnt/var/cache
rpool/var/tmp         899G  256K  899G   1% /mnt/var/tmp
rpool/var/lib/docker  899G  256K  899G   1% /mnt/var/lib/docker
/dev/sdc2             511M  4.0K  511M   1% /mnt/boot/efi

Now that we have everything setup and mounted, let's copy all files over.

Copying files This is a list of all the mounted filesystems

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

You can check that the list is correct with:

mount -l -t ext4,btrfs,vfat   awk ' print $3 '

Note that we skip /srv as it's on a different disk. On the first run, we had:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
     16,831,437 100%  184.14MB/s    0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
 28,019,293,280  94%   47.63MB/s    0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,267,990  98%   50.71MB/s    0:10:40 (xfr#736577, to-chk=0/867732)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
 24,456,268,098  98%   68.03MB/s    0:05:42 (xfr#159867, ir-chk=6875/172377) 
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125  93%   74.82MB/s    0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978  96%   76.82MB/s    1:05:15 (xfr#2256473, to-chk=0/2993950)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Note the failure to transfer that supysonic file? It turns out they had a weird filename in their source tree, since then removed, but still it showed how the utf8only feature might not be such a bad idea. At this point, the procedure was restarted all the way back to "Creating pools", after unmounting all ZFS filesystems (

umount
/mnt/run /mnt/boot/efi && umount -t zfs -a

) and destroying the pool, which, surprisingly, doesn't require any confirmation (

zpool destroy
rpool

). The second run was cleaner:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/110)  
syncing / to /mnt/...
 28,019,033,070  97%   42.03MB/s    0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,807,102  98%   44.84MB/s    0:12:04 (xfr#736580, to-chk=0/867723)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
 24,043,086,450  96%   62.03MB/s    0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507  96%   67.09MB/s    1:14:43 (xfr#2256845, to-chk=0/2994364)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Also note the transfer speed: we seem capped at 76MB/s, or 608Mbit/s. This is not as fast as I was expecting: the USB connection seems to be at around 5Gbps:

anarcat@curie:~$ lsusb -tv   head -4
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
     __ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
        ID 0b05:1932 ASUSTek Computer, Inc.

So it shouldn't cap at that speed. It's possible the USB adapter is failing to give me the full speed though. It's not the M.2 SSD drive either, as that has a ~500MB/s bandwidth, acccording to its spec. At this point, we're about ready to do the final configuration. We drop to single user mode and do the rest of the procedure. That used to be shutdown now, but it seems like the systemd switch broke that, so now you can reboot into grub and pick the "recovery" option. Alternatively, you might try systemctl rescue, as I found out. I also wanted to copy the drive over to another new NVMe drive, but that failed: it looks like the USB controller I have doesn't work with older, non-NVME drives.

Boot configuration Now we need to enter the new system to rebuild the boot loader and initrd and so on. First, we bind mounts and chroot into the ZFS disk:

mount --rbind /dev  /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys  /mnt/sys &&
chroot /mnt /bin/bash

Next we add an extra service that imports the bpool on boot, to make sure it survives a zpool.cache destruction:

cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF

Enable the service:

systemctl enable zfs-import-bpool.service

I had to trim down /etc/fstab and /etc/crypttab to only contain references to the legacy filesystems (/srv is still BTRFS!). If we don't already have a tmpfs defined in /etc/fstab:

ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount

Rebuild boot loader with support for ZFS, but also to workaround GRUB's missing zpool-features support:

grub-probe /boot   grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub

For good measure, make sure the right disk is configured here, for example you might want to tag both drives in a RAID array:

dpkg-reconfigure grub-pc

Install grub to EFI while you're there:

grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy

Filesystem mount ordering. The rationale here in the OpenZFS guide is a little strange, but I don't dare ignore that.

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &

Verify that zed updated the cache by making sure these are not empty:

cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool

Once the files have data, stop zed:

fg
Press Ctrl-C.

Fix the paths to eliminate /mnt:

sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*

Snapshot initial install:

zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install

Exit chroot:

exit

Finalizing One last sync was done in rescue mode:

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

Then we unmount all filesystems:

mount   grep -v zfs   tac   awk '/\/mnt/  print $3 '   xargs -i  umount -lf  
zpool export -a

Reboot, swap the drives, and boot in ZFS. Hurray!

Benchmarks This is a test that was ran in single-user mode using fio and the Ars Technica recommended tests, which are:

Single 4KiB random write process:

 fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1

16 parallel 64KiB random write processes:

 fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1

Single 1MiB random write process:

 fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1

Strangely, that's not exactly what the author, Jim Salter, did in his actual test bench used in the ZFS benchmarking article. The first thing is there's no read test at all, which is already pretty strange. But also it doesn't include stuff like dropping caches or repeating results. So here's my variation, which i called fio-ars-bench.sh for now. It just batches a bunch of fio tests, one by one, 60 seconds each. It should take about 12 minutes to run, as there are 3 pair of tests, read/write, with and without async. My bias, before building, running and analysing those results is that ZFS should outperform the traditional stack on writes, but possibly not on reads. It's also possible it outperforms it on both, because it's a newer drive. A new test might be possible with a new external USB drive as well, although I doubt I will find the time to do this.

Results All tests were done on WD blue SN550 drives, which claims to be able to push 2400MB/s read and 1750MB/s write. An extra drive was bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but discarded so that the drive difference is not a factor in the test. In practice, I'm going to assume we'll never reach those numbers because we're not actually NVMe (this is an old workstation!) so the bottleneck isn't the disk itself. For our purposes, it might still give us useful results.

Rescue test, LUKS/LVM/ext4 Those tests were performed with everything shutdown, after either entering the system in rescue mode, or by reaching that target with:

systemctl rescue

The network might have been started before or after the test as well:

systemctl start systemd-networkd

So it should be fairly reliable as basically nothing else is running. Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and manually merged:

test	read I/O	read IOPS	write I/O	write IOPS
rand4k4g1x	39.27	10052	212.15	54310
rand4k4g1x--fsync=1	39.29	10057	2.73	699
rand64k256m16x	1297.00	20751	1068.57	17097
rand64k256m16x--fsync=1	1290.90	20654	353.82	5661
rand1m16g1x	315.15	315	563.77	563
rand1m16g1x--fsync=1	345.88	345	157.01	157

Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the 64KB blocks with 16 jobs. Slowest is the random 4k block sync write at an abysmal 3MB/s and 700 IOPS The 1MB read/write tests have lower IOPS, but that is expected.

Rescue test, ZFS This test was also performed in rescue mode. Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and manually merged:

test read I/O read IOPS write I/O write IOPS

rand4k4g1x 77.20 19763 27.13 6944

rand4k4g1x--fsync=1 76.16 19495 6.53 1673

rand64k256m16x 1882.40 30118 70.58 1129

rand64k256m16x--fsync=1 1865.13 29842 71.98 1151

rand1m16g1x 921.62 921 102.21 102

rand1m16g1x--fsync=1 908.37 908 64.30 64

Peaks are at 1.8GiB/s read, also in the 64k job like above, but much faster. The write is, as expected, much slower at 70MiB/s (compared to 1GiB/s!), but it should be noted the sync write doesn't degrade performance compared to async writes (although it's still below the LVM 300MB/s).

test	read I/O	read IOPS	write I/O	write IOPS
rand4k4g1x	77.20	19763	27.13	6944
rand4k4g1x--fsync=1	76.16	19495	6.53	1673
rand64k256m16x	1882.40	30118	70.58	1129
rand64k256m16x--fsync=1	1865.13	29842	71.98	1151
rand1m16g1x	921.62	921	102.21	102
rand1m16g1x--fsync=1	908.37	908	64.30	64

Conclusions Really, ZFS has trouble performing in all write conditions. The random 4k sync write test is the only place where ZFS outperforms LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes are much slower, sometimes by an order of magnitude. And before some ZFS zealot jumps in talking about the SLOG or some other cache that could be added to improved performance, I'll remind you that those numbers are on a bare bones NVMe drive, pretty much as fast storage as you can find on this machine. Adding another NVMe drive as a cache probably will not improve write performance here. Still, those are very different results than the tests performed by Salter which shows ZFS beating traditional configurations in all categories but uncached 4k reads (not writes!). That said, those tests are very different from the tests I performed here, where I test writes on a single disk, not a RAID array, which might explain the discrepancy. Also, note that neither LVM or ZFS manage to reach the 2400MB/s read and 1750MB/s write performance specification. ZFS does manage to reach 82% of the read performance (1973MB/s) and LVM 64% of the write performance (1120MB/s). LVM hits 57% of the read performance and ZFS hits barely 6% of the write performance. Overall, I'm a bit disappointed in the ZFS write performance here, I must say. Maybe I need to tweak the record size or some other ZFS voodoo, but I'll note that I didn't have to do any such configuration on the other side to kick ZFS in the pants...

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Docker performance I had the feeling that running some git hook (which was firing a Docker container) was "slower" somehow. It seems that, at runtime, ZFS backends are significant slower than their overlayfs/ext4 equivalent:

May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.

Translating this:

container setup: ~1 second
container runtime: ~1 second
container teardown: ~1 second
total runtime: 2-3 seconds

Obviously, those timestamps are not quite accurate enough to make precise measurements... After I switched to ZFS:

mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080 
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142. 
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded. 
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.

That's double or triple the run time, from 2 seconds to 6 seconds. Most of the time is spent in run time, inside the container. Here's the breakdown:

container setup: ~2 seconds
container run: ~4 seconds
container teardown: ~1 second
total run time: about ~6-7 seconds

That's a two- to three-fold increase! Clearly something is going on here that I should tweak. It's possible that code path is less optimized in Docker. I also worry about podman, but apparently it also supports ZFS backends. Possibly it would perform better, but at this stage I wouldn't have a good comparison: maybe it would have performed better on non-ZFS as well...

Interactivity While doing the offsite backups (below), the system became somewhat "sluggish". I felt everything was slow, and I estimate it introduced ~50ms latency in any input device. Arguably, those are all USB and the external drive was connected through USB, but I suspect the ZFS drivers are not as well tuned with the scheduler as the regular filesystem drivers...

Recovery procedures For test purposes, I unmounted all systems during the procedure:

umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a

And disconnected the drive, to see how I would recover this system from another Linux system in case of a total motherboard failure. To import an existing pool, plug the device, then import the pool with an alternate root, so it doesn't mount over your existing filesystems, then you mount the root filesystem and all the others:

zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock

Offsite backup Part of the goal of using ZFS is to simplify and harden backups. I wanted to experiment with shorter recovery times specifically both point in time recovery objective and recovery time objective and faster incremental backups. This is, therefore, part of my backup services. This section documents how an external NVMe enclosure was setup in a pool to mirror the datasets from my workstation. The final setup should include syncoid copying datasets to the backup server regularly, but I haven't finished that configuration yet.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive:

sfdisk -d /dev/nvme0n1   sfdisk --no-reread /dev/sda --force

Pool creation This is similar to the main pool creation, except we tweaked a few bits after changing the upstream procedure:

zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O xattr=sa \
        -O compression=lz4 \
        -O devices=off \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/boot -R /mnt \
        bpool-tubman /dev/sdb3

The change from the main boot pool are:

no unicode normalization
different device path (sdb used to be the M.2 device, it's now nvme0n1)
reordered parameters

Main pool creation is:

zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool-tubman /dev/sdb4

First sync I used syncoid to copy all pools over to the external device. syncoid is a thing that's part of the sanoid project which is specifically designed to sync snapshots between pool, typically over SSH links but it can also operate locally. The sanoid command had a --readonly argument to simulate changes, but syncoid didn't so I tried to fix that with an upstream PR. It seems it would be better to do this by hand, but this was much easier. The full first sync was:

root@curie:/home/anarcat# ./bin/syncoid -r  bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create bpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%            
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================>                                                         ] 53%            
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
 126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
 113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r  rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create rpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================>  ] 99%            
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
 277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
 627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
 443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%            
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================>                                                                            ] 38%            
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%            
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
 219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%

Funny how the CRITICAL ERROR doesn't actually stop syncoid and it just carries on merrily doing when it's telling you it's "cowardly refusing to destroy your existing target"... Maybe that's because my pull request broke something though... During the transfer, the computer was very sluggish: everything feels like it has ~30-50ms latency extra:

anarcat@curie:sanoid$ LANG=C top -b  -n 1   head -20
top - 13:07:05 up 6 days,  4:01,  1 user,  load average: 16.13, 16.55, 11.83
Tasks: 606 total,   6 running, 598 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us, 72.5 sy,  1.2 ni,  5.0 id,  1.2 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  15898.4 total,   1387.6 free,  13170.0 used,   1340.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1319.8 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     70 root      20   0       0      0      0 S  83.3   0.0   6:12.67 kswapd0
4024878 root      20   0  282644  96432  10288 S  44.4   0.6   0:11.43 puppet
3896136 root      20   0   35328  16528     48 S  22.2   0.1   2:08.04 mbuffer
3896135 root      20   0   10328    776    168 R  16.7   0.0   1:22.93 zfs
3896138 root      20   0   10588    788    156 R  16.7   0.0   1:49.30 zfs
    350 root       0 -20       0      0      0 R  11.1   0.0   1:03.53 z_rd_int
    351 root       0 -20       0      0      0 S  11.1   0.0   1:04.15 z_rd_int
3896137 root      20   0    4384    352    244 R  11.1   0.0   0:44.73 pv
4034094 anarcat   30  10   20028  13960   2428 S  11.1   0.1   0:00.70 mbsync
4036539 anarcat   20   0    9604   3464   2408 R  11.1   0.0   0:00.04 top
    352 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    353 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    354 root       0 -20       0      0      0 S   5.6   0.0   1:04.01 z_rd_int

I wonder how much of that is due to syncoid, particularly because I often saw mbuffer and pv in there which are not strictly necessary to do those kind of operations, as far as I understand. Once that's done, export the pools to disconnect the drive:

zpool export bpool-tubman
zpool export rpool-tubman

Raw disk benchmark Copied the 512GB SSD/M.2 device to another 1024GB NVMe/M.2 device:

anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements  crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s

... while both over USB, whoohoo 300MB/s!

Monitoring ZFS should be monitoring your pools regularly. Normally, the [[!debman zed]] daemon monitors all ZFS events. It is the thing that will report when a scrub failed, for example. See this configuration guide. Scrubs should be regularly scheduled to ensure consistency of the pool. This can be done in newer zfsutils-linux versions (bullseye-backports or bookworm) with one of those, depending on the desired frequency:

systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now

When the scrub runs, if it finds anything it will send an event which will get picked up by the zed daemon which will then send a notification, see below for an example. TODO: deploy on curie, if possible (probably not because no RAID) TODO: this should be in Puppet

Scrub warning example So what happens when problems are found? Here's an example of how I dealt with an error I received. After setting up another server (tubman) with ZFS, I eventually ended up getting a warning from the ZFS toolchain.

Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
   eid: 39536
 class: scrub_finish
  host: tubman
  time: 2022-10-09 00:58:07-0400
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct  9 00:58:07 2022
config:
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb4    ONLINE       0     1     0
            sdc4    ONLINE       0     0     0
        cache
          sda3      ONLINE       0     0     0
errors: No known data errors

This, in itself, is a little worrisome. But it helpfully links to this more detailed documentation (and props up there: the link still works) which explains this is a "minor" problem (something that could be included in the report). In this case, this happened on a server setup on 2021-04-28, but the disks and server hardware are much older. The server itself (marcos v1) was built around 2011, over 10 years ago now. The hard drive in question is:

root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Some more SMART stats:

root@tubman:~# smartctl -a -qnoserial /dev/sdb   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12464 (206 202 0)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10966h+55m+23.757s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21107792664
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3201579750

That's over a year of power on, which shouldn't be so bad. It has written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which is about two full writes. According to its specification, this device is supposed to support 55 TB/year of writes, so we're far below spec. Note that are still far from the "non-recoverable read error per bits" spec (1 per 10E15), as we've basically read 13E12 bits (3201579750 LBAs * 512 byte/LBA = 13E12 bits). It's likely this disk was made in 2018, so it is in its fourth year. Interestingly, /dev/sdc is also a Seagate drive, but of a different series:

root@tubman:~# smartctl -qnoserial  -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

It has seen much more reads than the other disk which is also interesting:

root@tubman:~# smartctl -a -qnoserial /dev/sdc   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       36240
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       33994h+10m+52.118s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       30730174438
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       51894566538

That's 4 years of Head_Flying_Hours, and over 4 years (4 years and 48 days) of Power_On_Hours. The copyright date on that drive's specs goes back to 2016, so it's a much older drive. SMART self-test succeeded.

Remaining issues

TODO: move send/receive backups to offsite host, see also zfs for alternatives to syncoid/sanoid there

TODO: setup backup cron job (or timer?)

TODO: swap still not setup on curie, see zfs

TODO: document this somewhere: bpool and rpool are both pools and datasets. that's pretty confusing, but also very useful because it allows for pool-wide recursive snapshots, which are used for the backup system

fio improvements I really want to improve my experience with fio. Right now, I'm just cargo-culting stuff from other folks and I don't really like it. stressant is a good example of my struggles, in the sense that it doesn't really work that well for disk tests. I would love to have just a single .fio job file that lists multiple jobs to run serially. For example, this file describes the above workload pretty well:

[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1

... except the jobs are actually started in parallel, even though they are stonewall'd, as far as I can tell by the reports. I sent a mail to the fio mailing list for clarification. It looks like the jobs are started in parallel, but actual (correctly) run serially. It seems like this might just be a matter of reporting the right timestamps in the end, although it does feel like starting all the processes (even if not doing any work yet) could skew the results.

Hangs during procedure During the procedure, it happened a few times where any ZFS command would completely hang. It seems that using an external USB drive to sync stuff didn't work so well: sometimes it would reconnect under a different device (from sdc to sdd, for example), and this would greatly confuse ZFS. Here, for example, is sdd reappearing out of the blue:

May 19 11:22:53 curie kernel: [  699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [  699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [  699.922433] scsi 4:0:0:0: Direct-Access     ROG      ESD-S1C          0    PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [  699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [  699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [  699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [  699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [  699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [  699.961602]  sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [  699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk

Next time I run a ZFS command (say zpool list), the command completely hangs (D state) and this comes up in the logs:

May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648] 
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync        state:D stack:    0 pid:  997 ppid:     2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670]  schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675]  ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678]  io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689]  __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702]  __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816]  zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929]  dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032]  spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354]  ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366]  thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376]  ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379]  kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382]  ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386]  ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412]  ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420]  schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424]  __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435]  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537]  spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644]  zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool           state:D stack:    0 pid:11815 ppid:  3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995]  io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004]  cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118]  txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223]  txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325]  spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430]  ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

Here's another example, where you see the USB controller bleeping out and back into existence:

mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel:  __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel:  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel:  spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel:  zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool           state:D stack:    0 pid:11815 ppid:  2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel:  cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel:  ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel:  txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel:  txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel:  spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

I understand those are rather extreme conditions: I would fully expect the pool to stop working if the underlying drives disappear. What doesn't seem acceptable is that a command would completely hang like this.

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

5 November 2022

Anuradha Weeraman: Getting started with Linkerd

If you ve done anything in the Kubernetes space in recent years, you ve most likely come across the words Service Mesh . It s backed by a set of mature technologies that provides cross-cutting networking, security, infrastructure capabilities to be used by workloads running in Kubernetes in a manner that is transparent to the actual workload. This abstraction enables application developers to not worry about building in otherwise sophisticated capabilities for networking, routing, circuit-breaking and security, and simply rely on the services offered by the service mesh.In this post, I ll be covering Linkerd, which is an alternative to Istio. It has gone through a significant re-write when it transitioned from the JVM to a Go-based Control Plane and a Rust-based Data Plane a few years back and is now a part of the CNCF and is backed by Buoyant. It has proven itself widely for use in production workloads and has a healthy community and release cadence.It achieves this with a side-car container that communicates with a Linkerd control plane that allows central management of policy, telemetry, mutual TLS, traffic routing, shaping, retries, load balancing, circuit-breaking and other cross-cutting concerns before the traffic hits the container. This has made the task of implementing the application services much simpler as it is managed by container orchestrator and service mesh. I covered Istio in a prior post a few years back, and much of the content is still applicable for this post, if you d like to have a look.Here are the broad architectural components of Linkerd:

The components are separated into the control plane and the data plane.The control plane components live in its own namespace and consists of a controller that the Linkerd CLI interacts with via the Kubernetes API. The destination service is used for service discovery, TLS identity, policy on access control for inter-service communication and service profile information on routing, retries, timeouts. The identity service acts as the Certificate Authority which responds to Certificate Signing Requests (CSRs) from proxies for initialization and for service-to-service encrypted traffic. The proxy injector is an admission webhook that injects the Linkerd proxy side car and the init container automatically into a pod when the linkerd.io/inject: enabled is available on the namespace or workload.On the data plane side are two components. First, the init container, which is responsible for automatically forwarding incoming and outgoing traffic through the Linkerd proxy via iptables rules. Second, the Linkerd proxy, which is a lightweight micro-proxy written in Rust, is the data plane itself.I will be walking you through the setup of Linkerd (2.12.2 at the time of writing) on a Kubernetes cluster.Let s see what s running on the cluster currently. This assumes you have a cluster running and kubectl is installed and available on the PATH.

$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS       AGE
kube-system   calico-kube-controllers-59697b644f-7fsln   1/1     Running   2 (119m ago)   7d
kube-system   calico-node-6ptsh                          1/1     Running   2 (119m ago)   7d
kube-system   calico-node-7x5j8                          1/1     Running   2 (119m ago)   7d
kube-system   calico-node-qlnf6                          1/1     Running   2 (119m ago)   7d
kube-system   coredns-565d847f94-79jlw                   1/1     Running   2 (119m ago)   7d
kube-system   coredns-565d847f94-fqwn4                   1/1     Running   2 (119m ago)   7d
kube-system   etcd-k8s-master                            1/1     Running   2 (119m ago)   7d
kube-system   kube-apiserver-k8s-master                  1/1     Running   2 (119m ago)   7d
kube-system   kube-controller-manager-k8s-master         1/1     Running   2 (119m ago)   7d
kube-system   kube-proxy-4n9b7                           1/1     Running   2 (119m ago)   7d
kube-system   kube-proxy-k4rzv                           1/1     Running   2 (119m ago)   7d
kube-system   kube-proxy-lz2dd                           1/1     Running   2 (119m ago)   7d
kube-system   kube-scheduler-k8s-master                  1/1     Running   2 (119m ago)   7d

The first step would be to setup the Linkerd CLI:

$ curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install   sh

On most systems, this should be sufficient to setup the CLI. You may need to restart your terminal to load the updated paths. If you have a non-standard configuration and linkerd is not found after the installation, add the following to your PATH to be able to find the cli:

export PATH=$PATH:~/.linkerd2/bin/

At this point, checking the version would give you the following:

$ linkerd version
Client version: stable-2.12.2
Server version: unavailable

Setting up Linkerd Control PlaneBefore installing Linkerd on the cluster, run the following step to check the cluster for pre-requisites:

$ linkerd check --pre
Linkerd core checks
===================

kubernetes-api
--------------
  can initialize the client
  can query the Kubernetes API

kubernetes-version
------------------
  is running the minimum Kubernetes API version
  is running the minimum kubectl version

pre-kubernetes-setup
--------------------
  control plane namespace does not already exist
  can create non-namespaced resources
  can create ServiceAccounts
  can create Services
  can create Deployments
  can create CronJobs
  can create ConfigMaps
  can create Secrets
  can read Secrets
  can read extension-apiserver-authentication configmap
  no clock skew detected

linkerd-version
---------------
  can determine the latest version
  cli is up-to-date

Status check results are

All the pre-requisites appear to be good right now, and so installation can proceed.The first step of the installation is to setup the Custom Resource Definitions (CRDs) that Linkerd requires. The linkerd cli only prints the resource YAMLs to standard output and does not create them directly in Kubernetes, so you would need to pipe the output to kubectl apply to create the resources in the cluster that you re working with.

$ linkerd install --crds   kubectl apply -f -
Rendering Linkerd CRDs...
Next, run  linkerd install   kubectl apply -f -  to install the control plane.

customresourcedefinition.apiextensions.k8s.io/authorizationpolicies.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/meshtlsauthentications.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/networkauthentications.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/serverauthorizations.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/servers.policy.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/serviceprofiles.linkerd.io created

Next, install the Linkerd control plane components in the same manner, this time without the crds switch:

$ linkerd install   kubectl apply -f -       
namespace/linkerd created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-identity created
serviceaccount/linkerd-identity created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-destination created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-destination created
serviceaccount/linkerd-destination created
secret/linkerd-sp-validator-k8s-tls created
validatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-sp-validator-webhook-config created
secret/linkerd-policy-validator-k8s-tls created
validatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-policy-validator-webhook-config created
clusterrole.rbac.authorization.k8s.io/linkerd-policy created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-destination-policy created
role.rbac.authorization.k8s.io/linkerd-heartbeat created
rolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created
clusterrole.rbac.authorization.k8s.io/linkerd-heartbeat created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created
serviceaccount/linkerd-heartbeat created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
serviceaccount/linkerd-proxy-injector created
secret/linkerd-proxy-injector-k8s-tls created
mutatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-proxy-injector-webhook-config created
configmap/linkerd-config created
secret/linkerd-identity-issuer created
configmap/linkerd-identity-trust-roots created
service/linkerd-identity created
service/linkerd-identity-headless created
deployment.apps/linkerd-identity created
service/linkerd-dst created
service/linkerd-dst-headless created
service/linkerd-sp-validator created
service/linkerd-policy created
service/linkerd-policy-validator created
deployment.apps/linkerd-destination created
cronjob.batch/linkerd-heartbeat created
deployment.apps/linkerd-proxy-injector created
service/linkerd-proxy-injector created
secret/linkerd-config-overrides created

Kubernetes will start spinning up the data plane components and you should see the following when you list the pods:

$ kubectl get pods -A
...
linkerd       linkerd-destination-67b9cc8749-xqcbx       4/4     Running   0              69s
linkerd       linkerd-identity-59b46789cc-ntfcx          2/2     Running   0              69s
linkerd       linkerd-proxy-injector-7fc85556bf-vnvw6    1/2     Running   0              69s

The components are running in the new linkerd namespace.To verify the setup, run a check:

$ linkerd check
Linkerd core checks
===================

kubernetes-api
--------------
  can initialize the client
  can query the Kubernetes API

kubernetes-version
------------------
  is running the minimum Kubernetes API version
  is running the minimum kubectl version

linkerd-existence
-----------------
  'linkerd-config' config map exists
  heartbeat ServiceAccount exist
  control plane replica sets are ready
  no unschedulable pods
  control plane pods are ready
  cluster networks contains all pods
  cluster networks contains all services

linkerd-config
--------------
  control plane Namespace exists
  control plane ClusterRoles exist
  control plane ClusterRoleBindings exist
  control plane ServiceAccounts exist
  control plane CustomResourceDefinitions exist
  control plane MutatingWebhookConfigurations exist
  control plane ValidatingWebhookConfigurations exist
  proxy-init container runs as root user if docker container runtime is used

linkerd-identity
----------------
  certificate config is valid
  trust anchors are using supported crypto algorithm
  trust anchors are within their validity period
  trust anchors are valid for at least 60 days
  issuer cert is using supported crypto algorithm
  issuer cert is within its validity period
  issuer cert is valid for at least 60 days
  issuer cert is issued by the trust anchor

linkerd-webhooks-and-apisvc-tls
-------------------------------
  proxy-injector webhook has valid cert
  proxy-injector cert is valid for at least 60 days
  sp-validator webhook has valid cert
  sp-validator cert is valid for at least 60 days
  policy-validator webhook has valid cert
  policy-validator cert is valid for at least 60 days

linkerd-version
---------------
  can determine the latest version
  cli is up-to-date

control-plane-version
---------------------
  can retrieve the control plane version
  control plane is up-to-date
  control plane and cli versions match

linkerd-control-plane-proxy
---------------------------
  control plane proxies are healthy
  control plane proxies are up-to-date
  control plane proxies and cli versions match

Status check results are

Everything looks good.Setting up the Viz ExtensionAt this point, the required components for the service mesh are setup, but let s also install the viz extension, which provides a good visualization capabilities that will come in handy subsequently. Once again, linkerd uses the same pattern for installing the extension.

$ linkerd viz install   kubectl apply -f -
namespace/linkerd-viz created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-metrics-api created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-metrics-api created
serviceaccount/metrics-api created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-prometheus created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-prometheus created
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-admin created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-auth-delegator created
serviceaccount/tap created
rolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-auth-reader created
secret/tap-k8s-tls created
apiservice.apiregistration.k8s.io/v1alpha1.tap.linkerd.io created
role.rbac.authorization.k8s.io/web created
rolebinding.rbac.authorization.k8s.io/web created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-admin created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-api created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-api created
serviceaccount/web created
server.policy.linkerd.io/admin created
authorizationpolicy.policy.linkerd.io/admin created
networkauthentication.policy.linkerd.io/kubelet created
server.policy.linkerd.io/proxy-admin created
authorizationpolicy.policy.linkerd.io/proxy-admin created
service/metrics-api created
deployment.apps/metrics-api created
server.policy.linkerd.io/metrics-api created
authorizationpolicy.policy.linkerd.io/metrics-api created
meshtlsauthentication.policy.linkerd.io/metrics-api-web created
configmap/prometheus-config created
service/prometheus created
deployment.apps/prometheus created
service/tap created
deployment.apps/tap created
server.policy.linkerd.io/tap-api created
authorizationpolicy.policy.linkerd.io/tap created
clusterrole.rbac.authorization.k8s.io/linkerd-tap-injector created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-tap-injector created
serviceaccount/tap-injector created
secret/tap-injector-k8s-tls created
mutatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-tap-injector-webhook-config created
service/tap-injector created
deployment.apps/tap-injector created
server.policy.linkerd.io/tap-injector-webhook created
authorizationpolicy.policy.linkerd.io/tap-injector created
networkauthentication.policy.linkerd.io/kube-api-server created
service/web created
deployment.apps/web created
serviceprofile.linkerd.io/metrics-api.linkerd-viz.svc.cluster.local created
serviceprofile.linkerd.io/prometheus.linkerd-viz.svc.cluster.local created

A few seconds later, you should see the following in your pod list:

$ kubectl get pods -A
...
linkerd-viz   prometheus-b5865f776-w5ssf                 1/2     Running   0              35s
linkerd-viz   tap-64f5c8597b-rqgbk                       2/2     Running   0              35s
linkerd-viz   tap-injector-7c75cfff4c-wl9mx              2/2     Running   0              34s
linkerd-viz   web-8c444745-jhzr5                         2/2     Running   0              34s

The viz components live in the linkerd-viz namespace.You can now checkout the viz dashboard:

$ linkerd viz dashboard
Linkerd dashboard available at:
http://localhost:50750
Grafana dashboard available at:
http://localhost:50750/grafana
Opening Linkerd dashboard in the default browser
Opening in existing browser session.

The Meshed column indicates the workload that is currently integrated with the Linkerd control plane. As you can see, there are no application deployments right now that are running.Injecting the Linkerd Data Plane componentsThere are two ways to integrate Linkerd to the application containers:1 by manually injecting the Linkerd data plane components
2 by instructing Kubernetes to automatically inject the data plane componentsInject Linkerd data plane manuallyLet s try the first option. Below is a simple nginx-app that I will deploy into the cluster:

$ cat deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

$ kubectl apply -f deploy.yaml

Back in the viz dashboard, I do see the workload deployed, but it isn t currently communicating with the Linkerd control plane, and so doesn t show any metrics, and the Meshed count is 0:

Looking at the Pod s deployment YAML, I can see that it only includes the nginx container:

$ kubectl get pod nginx-deployment-cd55c47f5-cgxw2 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: aee0295dda906f7935ce5c150ae30360005f5330e98c75a550b7cc0d1532f529
    cni.projectcalico.org/podIP: 172.16.36.89/32
    cni.projectcalico.org/podIPs: 172.16.36.89/32
  creationTimestamp: "2022-11-05T19:35:12Z"
  generateName: nginx-deployment-cd55c47f5-
  labels:
    app: nginx
    pod-template-hash: cd55c47f5
  name: nginx-deployment-cd55c47f5-cgxw2
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-deployment-cd55c47f5
    uid: b604f5c4-f662-4333-aaa0-bd1a2b8b08c6
  resourceVersion: "22979"
  uid: 8fe30214-491b-4753-9fb2-485b6341376c
spec:
  containers:
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    resources:  
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2bt6z
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-node1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:  
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-2bt6z
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:35:12Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:35:16Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:35:16Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:35:13Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://f088f200315b44cbeed16499aba9b2d1396f9f81645e53b032d4bfa44166128a
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:943c25b4b66b332184d5ba6bb18234273551593016c0e0ae906bab111548239f
    lastState:  
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-11-05T19:35:15Z"
  hostIP: 192.168.2.216
  phase: Running
  podIP: 172.16.36.89
  podIPs:
  - ip: 172.16.36.89
  qosClass: BestEffort
  startTime: "2022-11-05T19:35:12Z"

Let s directly inject the linkerd data plane into this running container. We do this by retrieving the YAML of the deployment, piping it to linkerd cli to inject the necessary components and then piping to kubectl apply the changed resources.

$ kubectl get deploy nginx-deployment -o yaml   linkerd inject -   kubectl apply -f -

deployment "nginx-deployment" injected

deployment.apps/nginx-deployment configured

Back in the viz dashboard, the workload now is integrated into Linkerd control plane.

Looking at the updated Pod definition, we see a number of changes that the linkerd has injected that allows it to integrate with the control plane. Let s have a look:

$ kubectl get pod nginx-deployment-858bdd545b-55jpf -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 1ec3d345f859be8ead0374a7e880bcfdb9ba74a121b220a6fccbd342ac4b7ea8
    cni.projectcalico.org/podIP: 172.16.36.90/32
    cni.projectcalico.org/podIPs: 172.16.36.90/32
    linkerd.io/created-by: linkerd/proxy-injector stable-2.12.2
    linkerd.io/inject: enabled
    linkerd.io/proxy-version: stable-2.12.2
    linkerd.io/trust-root-sha256: 354fe6f49331e8e03d8fb07808e00a3e145d2661181cbfec7777b41051dc8e22
    viz.linkerd.io/tap-enabled: "true"
  creationTimestamp: "2022-11-05T19:44:15Z"
  generateName: nginx-deployment-858bdd545b-
  labels:
    app: nginx
    linkerd.io/control-plane-ns: linkerd
    linkerd.io/proxy-deployment: nginx-deployment
    linkerd.io/workload-ns: default
    pod-template-hash: 858bdd545b
  name: nginx-deployment-858bdd545b-55jpf
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: nginx-deployment-858bdd545b
    uid: 2e618972-aa10-4e35-a7dd-084853673a80
  resourceVersion: "23820"
  uid: 62f1857a-b701-4a19-8996-b5b605ff8488
spec:
  containers:
  - env:
    - name: _pod_name
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    - name: _pod_ns
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: _pod_nodeName
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: LINKERD2_PROXY_LOG
      value: warn,linkerd=info
    - name: LINKERD2_PROXY_LOG_FORMAT
      value: plain
    - name: LINKERD2_PROXY_DESTINATION_SVC_ADDR
      value: linkerd-dst-headless.linkerd.svc.cluster.local.:8086
    - name: LINKERD2_PROXY_DESTINATION_PROFILE_NETWORKS
      value: 10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
    - name: LINKERD2_PROXY_POLICY_SVC_ADDR
      value: linkerd-policy.linkerd.svc.cluster.local.:8090
    - name: LINKERD2_PROXY_POLICY_WORKLOAD
      value: $(_pod_ns):$(_pod_name)
    - name: LINKERD2_PROXY_INBOUND_DEFAULT_POLICY
      value: all-unauthenticated
    - name: LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS
      value: 10.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16
    - name: LINKERD2_PROXY_INBOUND_CONNECT_TIMEOUT
      value: 100ms
    - name: LINKERD2_PROXY_OUTBOUND_CONNECT_TIMEOUT
      value: 1000ms
    - name: LINKERD2_PROXY_CONTROL_LISTEN_ADDR
      value: 0.0.0.0:4190
    - name: LINKERD2_PROXY_ADMIN_LISTEN_ADDR
      value: 0.0.0.0:4191
    - name: LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR
      value: 127.0.0.1:4140
    - name: LINKERD2_PROXY_INBOUND_LISTEN_ADDR
      value: 0.0.0.0:4143
    - name: LINKERD2_PROXY_INBOUND_IPS
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIPs
    - name: LINKERD2_PROXY_INBOUND_PORTS
      value: "80"
    - name: LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES
      value: svc.cluster.local.
    - name: LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE
      value: 10000ms
    - name: LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE
      value: 10000ms
    - name: LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION
      value: 25,587,3306,4444,5432,6379,9300,11211
    - name: LINKERD2_PROXY_DESTINATION_CONTEXT
      value:  
         "ns":"$(_pod_ns)", "nodeName":"$(_pod_nodeName)" 
    - name: _pod_sa
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.serviceAccountName
    - name: _l5d_ns
      value: linkerd
    - name: _l5d_trustdomain
      value: cluster.local
    - name: LINKERD2_PROXY_IDENTITY_DIR
      value: /var/run/linkerd/identity/end-entity
    - name: LINKERD2_PROXY_IDENTITY_TRUST_ANCHORS
      value:  
        -----BEGIN CERTIFICATE-----
        MIIBiDCCAS6gAwIBAgIBATAKBggqhkjOPQQDAjAcMRowGAYDVQQDExFpZGVudGl0
        eS5saW5rZXJkLjAeFw0yMjExMDUxOTIxMDlaFw0yMzExMDUxOTIxMjlaMBwxGjAY
        BgNVBAMTEWlkZW50aXR5LmxpbmtlcmQuMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcD
        QgAE8AgxbWWa1qgEgN3ykFAOJ3sw9nSugUk1N5Qfvo6jXX/8/TZUW0ddko/N71+H
        EcKc72kK0tlclj8jDi3pzJ4C0KNhMF8wDgYDVR0PAQH/BAQDAgEGMB0GA1UdJQQW
        MBQGCCsGAQUFBwMBBggrBgEFBQcDAjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQW
        BBThSr0yAj5joW7pj/NZPYcfIIepbzAKBggqhkjOPQQDAgNIADBFAiAomg0TVn6N
        UxhOyzZdg848lAvH0Io9Ra/Ef2hxZGN0LgIhAIKjrsgDUqZA8XHiiciYYicxFnKr
        Tw5yj9gBhVAgYCaB
        -----END CERTIFICATE-----
    - name: LINKERD2_PROXY_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/tokens/linkerd-identity-token
    - name: LINKERD2_PROXY_IDENTITY_SVC_ADDR
      value: linkerd-identity-headless.linkerd.svc.cluster.local.:8080
    - name: LINKERD2_PROXY_IDENTITY_LOCAL_NAME
      value: $(_pod_sa).$(_pod_ns).serviceaccount.identity.linkerd.cluster.local
    - name: LINKERD2_PROXY_IDENTITY_SVC_NAME
      value: linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local
    - name: LINKERD2_PROXY_DESTINATION_SVC_NAME
      value: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
    - name: LINKERD2_PROXY_POLICY_SVC_NAME
      value: linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
    - name: LINKERD2_PROXY_TAP_SVC_NAME
      value: tap.linkerd-viz.serviceaccount.identity.linkerd.cluster.local
    image: cr.l5d.io/linkerd/proxy:stable-2.12.2
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - /usr/lib/linkerd/linkerd-await
          - --timeout=2m
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /live
        port: 4191
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: linkerd-proxy
    ports:
    - containerPort: 4143
      name: linkerd-proxy
      protocol: TCP
    - containerPort: 4191
      name: linkerd-admin
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /ready
        port: 4191
        scheme: HTTP
      initialDelaySeconds: 2
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:  
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsUser: 2102
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /var/run/linkerd/identity/end-entity
      name: linkerd-identity-end-entity
    - mountPath: /var/run/secrets/tokens
      name: linkerd-identity-token
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9zpnn
      readOnly: true
  - image: nginx:latest
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    resources:  
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9zpnn
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - args:
    - --incoming-proxy-port
    - "4143"
    - --outgoing-proxy-port
    - "4140"
    - --proxy-uid
    - "2102"
    - --inbound-ports-to-ignore
    - 4190,4191,4567,4568
    - --outbound-ports-to-ignore
    - 4567,4568
    image: cr.l5d.io/linkerd/proxy-init:v2.0.0
    imagePullPolicy: IfNotPresent
    name: linkerd-init
    resources:
      limits:
        cpu: 100m
        memory: 20Mi
      requests:
        cpu: 100m
        memory: 20Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
      privileged: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /run
      name: linkerd-proxy-init-xtables-lock
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-9zpnn
      readOnly: true
  nodeName: k8s-node1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:  
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-9zpnn
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
  - emptyDir:  
    name: linkerd-proxy-init-xtables-lock
  - emptyDir:
      medium: Memory
    name: linkerd-identity-end-entity
  - name: linkerd-identity-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: identity.l5d.io
          expirationSeconds: 86400
          path: linkerd-identity-token
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:44:16Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:44:19Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:44:19Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-11-05T19:44:15Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://62028867c48aaa726df48249a27c52cd8820cd33e8e5695ad0d322540924754e
    image: cr.l5d.io/linkerd/proxy:stable-2.12.2
    imageID: cr.l5d.io/linkerd/proxy@sha256:787db5055b2a46a3c4318ef3b632461261f81254c8e47bf4b9b8dab2c42575e4
    lastState:  
    name: linkerd-proxy
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-11-05T19:44:16Z"
  - containerID: containerd://8f8ce663c19360a7b6868ace68a4a5119f0b18cd57ffebcc2d19331274038381
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:943c25b4b66b332184d5ba6bb18234273551593016c0e0ae906bab111548239f
    lastState:  
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-11-05T19:44:19Z"
  hostIP: 192.168.2.216
  initContainerStatuses:
  - containerID: containerd://c0417ea9c8418ab296bf86077e81c5d8be06fe9b87390c138d1c5d7b73cc577c
    image: cr.l5d.io/linkerd/proxy-init:v2.0.0
    imageID: cr.l5d.io/linkerd/proxy-init@sha256:7d5e66b9e176b1ebbdd7f40b6385d1885e82c80a06f4c6af868247bb1dffe262
    lastState:  
    name: linkerd-init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: containerd://c0417ea9c8418ab296bf86077e81c5d8be06fe9b87390c138d1c5d7b73cc577c
        exitCode: 0
        finishedAt: "2022-11-05T19:44:16Z"
        reason: Completed
        startedAt: "2022-11-05T19:44:15Z"
  phase: Running
  podIP: 172.16.36.90
  podIPs:
  - ip: 172.16.36.90
  qosClass: Burstable
  startTime: "2022-11-05T19:44:15Z"

At this point, the necessary components are setup for you to explore Linkerd further. You can also try out the jaeger and multicluster extensions, similar to the process of installing and using the viz extension and try out their capabilities.Inject Linkerd data plane automaticallyIn this approach, we shall we how to instruct Kubernetes to automatically inject the Linkerd data plane to workloads at deployment time.We can achieve this by adding the linkerd.io/inject annotation to the deployment descriptor which causes the proxy injector admission hook to execute and inject linkerd data plane components automatically at the time of deployment.

$ cat deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

This annotation can also be specified at the namespace level to affect all the workloads within the namespace. Note that any resources created before the annotation was added to the namespace will require a rollout restart to trigger the injection of the Linkerd components.Uninstalling LinkerdNow that we have walked through the installation and setup process of Linkerd, let s also cover how to remove it from the infrastructure and go back to the state prior to its installation.The first step would be to remove extensions, such as viz.

$ linkerd viz uninstall   kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-metrics-api" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-prometheus" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-tap" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-tap-admin" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-web-api" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-viz-web-check" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-tap-injector" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-metrics-api" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-prometheus" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-tap" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-tap-auth-delegator" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-web-admin" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-web-api" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-web-check" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-tap-injector" deleted
role.rbac.authorization.k8s.io "web" deleted
rolebinding.rbac.authorization.k8s.io "linkerd-linkerd-viz-tap-auth-reader" deleted
rolebinding.rbac.authorization.k8s.io "web" deleted
apiservice.apiregistration.k8s.io "v1alpha1.tap.linkerd.io" deleted
mutatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-tap-injector-webhook-config" deleted
namespace "linkerd-viz" deleted
authorizationpolicy.policy.linkerd.io "admin" deleted
authorizationpolicy.policy.linkerd.io "metrics-api" deleted
authorizationpolicy.policy.linkerd.io "proxy-admin" deleted
authorizationpolicy.policy.linkerd.io "tap" deleted
authorizationpolicy.policy.linkerd.io "tap-injector" deleted
server.policy.linkerd.io "admin" deleted
server.policy.linkerd.io "metrics-api" deleted
server.policy.linkerd.io "proxy-admin" deleted
server.policy.linkerd.io "tap-api" deleted
server.policy.linkerd.io "tap-injector-webhook" deleted

In order to uninstall the control plane, you would need to first uninject the Linkerd control plane components from any existing running pods by:

$ kubectl get deployments
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   2/2     2            2           10m

$ kubectl get deployment nginx-deployment -o yaml   linkerd uninject -   kubectl apply -f -

deployment "nginx-deployment" uninjected

deployment.apps/nginx-deployment configured

Now you can delete the control plane.

$ linkerd uninstall   kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-heartbeat" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-identity" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-proxy-injector" deleted
clusterrole.rbac.authorization.k8s.io "linkerd-policy" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-destination-policy" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-heartbeat" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-destination" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-identity" deleted
clusterrolebinding.rbac.authorization.k8s.io "linkerd-linkerd-proxy-injector" deleted
role.rbac.authorization.k8s.io "linkerd-heartbeat" deleted
rolebinding.rbac.authorization.k8s.io "linkerd-heartbeat" deleted
customresourcedefinition.apiextensions.k8s.io "authorizationpolicies.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "httproutes.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "meshtlsauthentications.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "networkauthentications.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "serverauthorizations.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "servers.policy.linkerd.io" deleted
customresourcedefinition.apiextensions.k8s.io "serviceprofiles.linkerd.io" deleted
mutatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-policy-validator-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io "linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted

At this point we re back to the original state:

$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS        AGE
default       nginx-deployment-cd55c47f5-99xf2           1/1     Running   0               82s
default       nginx-deployment-cd55c47f5-tt58t           1/1     Running   0               86s
kube-system   calico-kube-controllers-59697b644f-7fsln   1/1     Running   2 (3h39m ago)   7d1h
kube-system   calico-node-6ptsh                          1/1     Running   2 (3h39m ago)   7d1h
kube-system   calico-node-7x5j8                          1/1     Running   2 (3h39m ago)   7d1h
kube-system   calico-node-qlnf6                          1/1     Running   2 (3h39m ago)   7d1h
kube-system   coredns-565d847f94-79jlw                   1/1     Running   2 (3h39m ago)   7d2h
kube-system   coredns-565d847f94-fqwn4                   1/1     Running   2 (3h39m ago)   7d2h
kube-system   etcd-k8s-master                            1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-apiserver-k8s-master                  1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-controller-manager-k8s-master         1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-proxy-4n9b7                           1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-proxy-k4rzv                           1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-proxy-lz2dd                           1/1     Running   2 (3h39m ago)   7d2h
kube-system   kube-scheduler-k8s-master                  1/1     Running   2 (3h39m ago)   7d2h

I hope you find this useful to get you started on your journey with Linkerd. Head on over to the docs for more information, guides and best practices.

3 November 2022

Arturo Borrero Gonz lez: New OpenPGP key and new email

I m trying to replace my old OpenPGP key with a new one. The old key wasn t compromised or lost or anything bad. Is still valid, but I plan to get rid of it soon. It was created in 2013. The new key id fingerprint is: AA66280D4EF0BFCC6BFC2104DA5ECB231C8F04C4 I plan to use the new key for things like encrypted emails, uploads to the Debian archive, and more. Also, the new key includes an identity with a newer personal email address I plan to use soon: arturo.bg@arturo.bg The new key has been uploaded to some public keyservers. If you would like to sign the new key, please follow the steps in the Debian wiki.

If you are curious about what that long code block contains, check this https://cirw.in/gpg-decoder/ For the record, the old key fingerprint is: DD9861AB23DC3333892E07A968E713981D1515F8 Cheers!

1 November 2022

Jonathan Dowland: Halloween playlist 2022

I hope you had a nice Halloween! I've collected together some songs that I've enjoyed over the last couple of years that loosely fit a theme: ambient, instrumental, experimental, industrial, dark, disconcerting, etc. I've prepared a Spotify playlist of most of them, but not all. The list is inline below as well, with many (but not all) tracks linking to Bandcamp, if I could find them there. This is a bit late, sorry. If anyone listens to something here and has any feedback I'd love to hear it. (If you are reading this on an aggregation site, it's possible the embeds won't work. If so, click through to my main site) Spotify playlist: https://open.spotify.com/playlist/3bEvEguRnf9U1RFrNbv5fk?si=9084cbf78c364ac8; The list, with Bandcamp embeds where possible:

Miguel A. Ruiz Transparent (spanish ambient, March 2020) Transparent by Miguel A. Ruiz
Arushi Jain My People Have Deep Roots (ambient, drone, atmospheric, indian) Under the Lilac Sky by arushi jain
K Den - Love Horses (Ambient, builds up, never quite reaching DnB) Brabuhr Q-ih by Various Artists
Bivouac The Great Escape (drone, spoken word, ambient, psych, November 2021; not on Spotify) bivouac - S/T by bivouac
Gazelle Twin & NYX - Throne (ambient, choral, dark, drone, January 2022) Deep England by Gazelle Twin & NYX
Nurse With Wound Transfiguration 3 (Incorporating Opium Cabaret) (March 2022)
Eluvium Indoor Swimming At The Space Station (ambient, chill, long, loops, euphoric, March 2022) Indoor Swimming At The Space Station by Eluvium
Hildur Gu nad ttir The Door
Delia Derbyshire Ziwzih Ziwzih OO-OO-OO (November 2021)
Afterhere, Berenice Scott, Glenn Gregory - Butterfly (from the Vigil TV series, September 2021)
Wardruna Andvevarljod (2021)
Christina Vantzou Dvorjacked (March 2021) Colliding Wind by Concentric Records
Oranssi Pazuzu Ilmestys (January 2022) Ilmestys by Oranssi Pazuzu
Coil Tiny Golden Books Tiny Golden Books by Coil
Gazelle Twin & Trash Generation History History by Gazelle Twin & Trash Generation
Isambard Khroustaliov A Pre-History of Cybernetics (Freak Zone) (Feb 2022) A Pre-History of Cybernetics by Isambard Khroustaliov
Boards of Canada Dayvan Cowboy (via Lauren Laverne) (Jun 2022) Dayvan Cowboy by Boards of Canada
Emeka Ogboh No Countefeit (via Jamz Supernova: trancy, african beats) (Sep 2022) No Countefeit by Emeka Ogboh
Burial Streetlands (dark ambient) (Oct 2022) Streetlands by Burial
Raison D'etre - Of Dying Relics (dark industrial/ambient) (Oct 2022) Of Dying Relics by Raison D'etre
Vanishing 55 N, 5 E (Not on Spotify) 55 N, 5 E by Vanishing
Ben Salisbury, Geoff Barrow The Alien The Alien by Ben Salisbury, Geoff Barrow
Coil Strange Birds Strange Birds by Coil

Some sources

Louis-Philippe V ronneau: Montreal's Debian & Stuff - October 2022

Our local Debian user group gathered on Sunday October 30th to chat, work on Debian and do other, non-Debian related hacking :) This time around, we met at EfficiOS's¹ offices. As you can see from the following picture, it's a great place and the view they have is pretty awesome. Many thanks for hosting us! The view from EfficiOS' offices, overlooking the Mont-Royal

The view from EfficiOS' offices, overlooking the Mont-Royal

This was our 4th meeting this year and once again, attendance was great: 10 people showed up to work on various things. Following our bi-monthly schedule, our next meeting should be in December, but I'm not sure it'll happen. December can be a busy month here and I will have to poke our mailing list to see if people have the spoons for an event. This time around, I was able to get a rough log of the Debian work people did: pollo:

followed up with the DPL for upcoming Debian Python Team Sprint
opened a wishlist bug (#1023140) on the devscripts package for uscan to verify git tags signed with SSH keys
updated python-mediafile and mutagen to the latest upstream version

mjeanson:

fixed RC bug #1022419 on babeltrace
kindly hosted us, with the help of Mathieu Desnoyers

viashimo:

started updating puppet-strings to 3.0.1
was blocked in the process by tests requiring ruby-mdl, which is not packaged yet
was blocked in packaging ruby-mdl by rubocop not being >= ~1.2x

lavamind:

updated jnffi to 1.3.9, as part of the work on jruby he's been doing

anarcat:

updated g10k to latest upstream

babelouest:

updated node-jose to 4.10.4
worked on bug #1021779 in libical3

tvaz:

worked on updating apticron

As always, thanks to the Debian project for granting us a budget to buy some food!

Makers of the awesome LTTng project, amongst other things.

28 October 2022

Shirish Agarwal: Shantaram, The Pyramid, Japan s Hikikomori & Backpack

Shantaram I know I have been quite behind in review of books but then that s life. First up is actually not as much as a shocker but somewhat of a pleasant surprise. So, a bit of background before I share the news. If you have been living under a rock, then about 10-12 years ago a book called Shantaram was released. While the book is said to have been released in 2003/4 I got it in my hand around 2008/09 or somewhere around that. The book is like a good meal, a buffet. To share the synopsis, Lin a 20 something Australian guy gets involved with a girl, she encourages him to get into heroin, he becomes a heroin user. And drugs, especially hard drugs need constant replenishment, it is a chemical thing. So, to fund those cravings, he starts to steal, rising to rob a bank and while getting away shoots a cop who becomes dead. Now either he surrenders or is caught is unclear, but he is tortured in the jail. So one day, he escapes from prison, lands up at home of somebody who owes him a favor, gets some money, gets a fake passport and lands up in Mumbai/Bombay as it was then known. This is from where the actual story starts. And how a 6 foot something Australian guy relying on his street smartness and know how the transformation happens from Lin to Shantaram. Now what I have shared is perhaps just 5% of the synopsis, as shared the real story starts here. Now the good news, last week 4 episodes of Shantaram were screened by Apple TV. Interestingly, I have seen quite a number people turning up to buy or get this book and also sharing it on Goodreads. Now there seems to have been some differences from the book to TV. Now I m relying on 10-12 year back memory but IIRC Khaderbhai, one of the main characters who sort of takes Lin/Shantaram under his wing is an Indian. In the series, he is a western or at least looks western/Middle Eastern to me. Also, they have tried to reproduce 1980s in Mumbai/Bombay but dunno how accurate they were My impression of that city from couple of visits at that point in time where they were still more tongas (horse-ridden carriages), an occasional two wheelers and not many three wheelers. Although, it was one of the more turbulent times as lot of agitation for worker rights were happening around that time and a lot of industrial action. Later that led to lot of closure of manufacturing in Bombay and it became more commercial. It would be interesting to know whether they shot it in actual India or just made a set somewhere in Australia, where it possibly might have been shot. The chawl of the book needs a bit of arid land and Australia has lots of it. It is also interesting as this was a project that had who s who interested in it for a long time but somehow none of them was able to bring the project to fruition, the project seems to largely have an Australian cast as well as second generations of Indians growing in Australia. To take names, Amitabh Bacchan, Johnny Depp, Russel Crowe each of them wanted to make it into a feature film. In retrospect, it is good it was not into a movie, otherwise they would have to cut a lot of material and that perhaps wouldn t have been sufficient. Making it into a web series made sure they could have it in multiple seasons if people like it. There is a lot between now and 12 episodes to even guess till where it would leave you then. So, if you have not read the book and have some holidays coming up, can recommend it. The writing IIRC is easy and just flows. There is a bit of action but much more nuance in the book while in the web series they are naturally more about action. There is also quite a bit of philosophy between him and Kaderbhai and while the series touches upon it, it doesn t do justice but then again it is being commercially made. Read the book, see the series and share your thoughts on what you think. It is possible that the series might go up or down but am sharing from where I see it, may do another at the end of the season, depending on where they leave it and my impressions. Update A slight update from the last blog post. Seems Rishi Sunak seems would be made PM of UK. With Hunt as chancellor and Rishi Sunak, Austerity 2.0 seems complete. There have been numerous articles which share how austerity gives rises to fascism and vice-versa. History gives lot of lessons about the same. In Germany, when the economy was not good, it was all blamed on the Jews for number of years. This was the reason for rise of Hitler, and while it did go up by a bit, propaganda by him and his loyalists did the rest. And we know and have read about the Holocaust. Today quite a few Germans deny it or deny parts of it but that s how misinformation spreads. Also Hitler is looked now more as an aberration rather than something to do with the German soul. I am not gonna talk more as there is still lots to share and that actually perhaps requires its own blog post to do justice for the same.

The Pyramid by Henning Mankell I had actually wanted to review this book but then the bomb called Shantaram appeared and I had to post it above. I had read two-three books before it, but most of them were about multiple beheadings and serial killers. Enough to put anybody into depression. I do not know if modern crime needs to show crime and desperation of and to such a level. Why I and most loved and continue to love Sherlock Holmes as most stories were not about gross violence but rather a homage to the art of deduction, which pretty much seems to be missing in modern crime thrillers rather than grotesque stuff. In that, like a sort of fresh air I read/am reading the Pyramid by Henning Mankell. The book is about a character made by Monsieur Henning Mankell named Kurt Wallender. I am aware of the series called Wallender but haven t yet seen it. The book starts with Wallender as a beat cop around age 20 and on his first case. He is ambitious, wants to become a detective and has a narrow escape with death. I wouldn t go much into it as it basically gives you an idea of the character and how he thinks and what he does. He is more intuitive by nature and somewhat of a loner. Probably most detectives IRL are, dunno, have no clue. At least in the literary world it makes sense, in real world think there would be much irony for sure. This is speculation on my part, who knows. Back to the book though. The book has 5 stories a sort of prequel one could say but also not entirely true. The first case starts when he is a beat cop in 1969 and he is just a beat cop. It is a kind of a prequel and a kind of an anthology as he covers from the first case to the 1990s where he is ending his career sort of. Before I start sharing about the stories in the book, I found the foreword also quite interesting. It asks questions about the interplay of the role of welfare state and the Swedish democracy. Incidentally did watch couple of videos about a sort of mixed sort of political representation that happens in Sweden. It uses what is known as proportional representation. Ironically, Sweden made a turn to the far right this election season. The book was originally in Swedish and were translated to English by Ebba Segerberg and Laurie Thompson. While all the stories are interesting, would share the last three or at least ask the questions of intrigue. Of course, to answer them you would need to read the book So the last three stories I found the most intriguing. The first one is titled Man on the Beach. Apparently, a gentleman goes to one of the beaches, a sort of lonely beach, hails a taxi and while returning suddenly dies. The Taxi driver showing good presence of mind takes it to hospital where the gentleman is declared dead on arrival. Unlike in India, he doesn t run away but goes to the cafeteria and waits there for the cops to arrive and take his statement. Now the man is in his early 40s and looks to be fit. Upon searching his pockets he is found to relatively well-off and later it turns out he owns a couple of shops. So then here are the questions ? What was the man doing on a beach, in summer that beach is somewhat popular but other times not so much, so what was he doing there? How did he die, was it a simple heart attack or something more? If he had been drugged or something then when and how? These and more questions can be answered by reading the story Man on the Beach . 2. The death of a photographer Apparently, Kurt lives in a small town where almost all the residents have been served one way or the other by the town photographer. The man was polite and had worked for something like 40 odd years before he is killed/murdered. Apparently, he is murdered late at night. So here come the questions a. The shop doesn t even stock any cameras and his cash box has cash. Further investigation reveals it is approximate to his average takeout for the day. So if it s not for cash, then what is the motive ? b. The body was discovered by his cleaning staff who has worked for almost 20 years, 3 days a week. She has her own set of keys to come and clean the office? Did she give the keys to someone, if yes why? c. Even after investigation, there is no scandal about the man, no other woman or any vices like gambling etc. that could rack up loans. Also, nobody seems to know him and yet take him for granted till he dies. The whole thing appears to be quite strange. Again, the answers lie in the book. 3. The Pyramid Kurt is sleeping one night when the telephone rings. The scene starts with a Piper Cherokee, a single piston aircraft flying low and dropping something somewhere or getting somebody from/on the coast of Sweden. It turns and after a while crashes. Kurt is called to investigate it. Turns out, the plane was supposed to be destroyed. On crash, both the pilot and the passenger are into pieces so only dental records can prove who they are. Same day or a day or two later, two seemingly ordinary somewhat elderly women, spinsters, by all accounts, live above the shop where they sell buttons and all kinds of sewing needs of the town. They seem middle-class. Later the charred bodies of the two sisters are found :(. So here come the questions a.Did the plane drop something or pick something somebody up ? The Cherokee is a small plane so any plane field or something it could have landed up or if a place was somehow marked then could be dropped or picked up without actually landing. b. The firefighter suspects arson started at multiple places with the use of petrol? The question is why would somebody wanna do that? The sisters don t seem to be wealthy and practically everybody has bought stuff from them. They weren t popular but weren t also unpopular. c. Are the two crimes connected or unconnected? If connected, then how? d. Most important question, why the title Pyramid is given to the story. Why does the author share the name Pyramid. Does he mean the same or the original thing? He could have named it triangle. Again, answers to all the above can be found in the book. One thing I also became very aware of during reading the book that it is difficult to understand people s behavior and what they do. And this is without even any criminality involved in. Let s say for e.g. I die in some mysterious circumstances, the possibility of the police finding my actions in last days would be limited and this is when I have hearing loss. And this probably is more to do with how our minds are wired. And most people I know are much more privacy conscious/aware than I am.

Japan s Hikikomori Japan has been a curious country. It was more or less a colonizer and somewhat of a feared power till it dragged the U.S. unnecessarily in World War 2. The result of the two atom bombs and the restitution meant that Japan had to build again from the ground up. It is also in a seismically unstable place as they have frequent earthquakes although the buildings are hardened/balanced to make sure that vibrations don t tear buildings apart. Had seen years ago on Natgeo a documentary that explains all that. Apart from that, Japan was helped by the Americans and there was good kinship between them till the 1980s till it signed the Plaza Accord which enhanced asset price bubbles that eventually burst. Something from which they are smarting even today. Japan has a constitutional monarchy. A somewhat history lesson or why it exists even today can be found here. Asset price bubbles of the 1980s, more than 50 percent of the population on zero hour contracts and the rest tend to suffer from overwork. There is a term called Karoshi that explains all. An Indian pig-pen would be two, two and a half times larger than a typical Japanese home. Most Japanese live in micro-apartments called konbachiku . All of the above stresses meant that lately many young Japanese people have become Hikikomori. Bloomberg featured about the same a couple of years back. I came to know about it as many Indians are given the idea of Japan being a successful country without knowing the ills and issues it faces. Even in that most women get the wrong end of the short stick i.e. even it they manage to find jobs, it would be most back-breaking menial work. The employment statistics of Japan s internal ministry tells its own story.

If you look at the data above, it seems that the between 2002 and 2019, the share of zero hour contracts has increased while regular work has decreased. This also means that those on the bottom of the ladder can no longer afford a home. There is and was a viral video called Lost in Manboo that went viral few years ago. It is a perfect set of storms. Add to that the Fukushima nuclear incident about which I had shared a few years ago. While the workers are blamed but all design decisions are taken by the management. And as was shown in numerous movies, documentaries etc. Interestingly, and somewhat ironically, the line workers knew the correct things to do and correct decisions to take unlike the management. The shut-ins story is almost a decade or two decades old. It is similar story in South Korea but not as depressive as the in Japan. It is somewhat depressive story but needed to be shared. The stories shared in the bloomberg article makes your heart ache

Backpacks In and around 2015, I had bought a Targus backpack, very much similar to the Targus TSB194US-70 Motor 16-inch Backpack. That bag has given me a lot of comfort over the years but now has become frayed the zip sometimes work and sometimes doesn t. Unlike those days there are a bunch of companies now operating in India. There are eight different companies that I came to know about, Aircase, Harrisons Sirius, HP Oddyssey, Mokobara, Artic Hunter, Dell Pro Hybrid, Dell Roller Backpack and lastly the Decathlon Quechua Hiking backpack 32L NH Escape 500 . Now of all the above, two backpacks seem the best, the first one is Harrisons Sirius, with 45L capacity, I don t think I would need another bag at all. The runner-up is the Decathlon Quecha Hiking Backpack 32L. One of the better things in all the bags is that all have hidden pockets for easy taking in and out of passport while having being ant-theft. I do not have to stress how stressful it is to take out the passport and put it back in. Almost all the vendors have made sure that it is not a stress point anymore. The good thing about the Quecha is that they are giving 10 years warranty, the point to be asked is if that is does the warranty cover the zip. Zips are the first thing that goes out in bags.That actually has what happened to my current bag. Decathlon has a store in Wakad, Pune while I have reached out to the gentleman in charge of Harrisons India to see if they have a reseller in Pune. So hopefully, in next one week I should have a backpack that isn t spilling with things all over the place, whichever I m able to figure out.

7 October 2022

Reproducible Builds: Reproducible Builds in September 2022

Welcome to the September 2022 report from the Reproducible Builds project! In our reports we try to outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. If you are interested in contributing to the project, please visit our Contribute page on our website.

David A. Wheeler reported to us that the US National Security Agency (NSA), Cybersecurity and Infrastructure Security Agency (CISA) and the Office of the Director of National Intelligence (ODNI) have released a document called Securing the Software Supply Chain: Recommended Practices Guide for Developers (PDF). As David remarked in his post to our mailing list, it expressly recommends having reproducible builds as part of advanced recommended mitigations . The publication of this document has been accompanied by a press release.

Holger Levsen was made aware of a small Microsoft project called oss-reproducible. Part of, OSSGadget, a larger collection of tools for analyzing open source packages , the purpose of oss-reproducible is to:

analyze open source packages for reproducibility. We start with an existing package (for example, the NPM left-pad package, version 1.3.0), and we try to answer the question, Do the package contents authentically reflect the purported source code?

More details can be found in the README.md file within the code repository.

David A. Wheeler also pointed out that there are some potential upcoming changes to the OpenSSF Best Practices badge for open source software in relation to reproducibility. Whilst the badge programme has three certification levels ( passing , silver and gold ), the gold level includes the criterion that The project MUST have a reproducible build . David reported that some projects have argued that this reproducibility criterion should be slightly relaxed as outlined in an issue on the best-practices-badge GitHub project. Essentially, though, the claim is that the reproducibility requirement doesn t make sense for projects that do not release built software, and that timestamp differences by themselves don t necessarily indicate malicious changes. Numerous pragmatic problems around excluding timestamps were raised in the discussion of the issue.

Sonatype, a pioneer of software supply chain management , issued a press release month to report that they had found:

[ ] a massive year-over-year increase in cyberattacks aimed at open source project ecosystems. According to early data from Sonatype s 8th annual State of the Software Supply Chain Report, which will be released in full this October, Sonatype has recorded an average 700% jump in repository attacks over the last three years.

More information is available in the press release.

A number of changes were made to the Reproducible Builds website and documentation this month, including Chris Lamb adding a redirect from /projects/ to /who/ in order to keep old or archived links working [ ], Jelle van der Waa added a Rust programming language example for SOURCE_DATE_EPOCH [ ][ ] and Mattia Rizzolo included Protocol Labs amongst our project-level sponsors [ ].

Debian There was a large amount of reproducibility work taking place within Debian this month:

The `nfft` source package was removed from the archive, and now all packages in Debian bookworm now have a corresponding `.buildinfo` file. This can be confirmed and tracked on the associated page on the tests.reproducible-builds.org site.

Vagrant Cascadian announced on our mailing list an informal online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMU (Non-Maintainer Uploads). The first such sprint took place on September 22nd with the following results:

Holger Levsen:

Mailed #1010957 in `man-db` asking for an update and whether to remove the patch tag for now. This was subsequently removed and the maintainer started to address the issue.

Uploaded `gmp` to `DELAYED/15`, fixing #1009931.

Emailed #1017372 in `plymouth` and asked for the maintainer s opinion on the patch. This resulted in the maintainer improving Vagrant s original patch (and uploading it) as well as filing an issue upstream.

Uploaded `time` to `DELAYED/15`, fixing #983202.

Vagrant Cascadian:

Verify and updated patch for `mylvmbackup` (#782318)

Verified/updated patches for `libranlip`. (#788000, #846975 & #1007137)

Uploaded `libranlip` to `DELAYED/10`.

Verified patch for `cclive`. (#824501)

Uploaded `cclive` to `DELAYED/10`.

Vagrant was unable to reproduce the underlying issue within #791423 (`linuxtv-dvb-apps`) and so the bug was marked as done .

Researched #794398 (in `clhep`).

The plan is to repeat these sprints every two weeks, with the next taking place on Thursday October 6th at 16:00 UTC on the `#debian-reproducible` IRC channel.

Roland Clobus posted his 13th update of the status of reproducible Debian ISO images on our mailing list. During the last month, Roland ensured that the live images are now automatically fed to openQA for automated testing after they have been shown to be reproducible. Additionally Roland asked on the debian-devel mailing list about a way to determine the canonical timestamp of the Debian archive. [ ]

Following up on last month s work on reproducible bootstrapping, Holger Levsen filed two bugs against the debootstrap and cdebootstrap utilities. (#1019697 & #1019698)

Lastly, 44 reviews of Debian packages were added, 91 were updated and 17 were removed this month adding to our knowledge about identified issues. A number of issue types have been updated too, including the descriptions of `cmake_rpath_contains_build_path` [ ], `nondeterministic_version_generated_by_python_param` [ ] and `timestamps_in_documentation_generated_by_org_mode` [ ]. Furthermore, two new issue types were created: `build_path_used_to_determine_version_or_package_name` [ ] and `captures_build_path_via_cmake_variables` [ ].

Other distributions In openSUSE, Bernhard M. Wiedemann published his usual openSUSE monthly report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions `222` and `223` to Debian, as well as made the following changes:

The `cbfstools` utility is now provided in Debian via the `coreboot-utils` package so we can enable that functionality within Debian. [ ]

Looked into Mach-O support.

Fixed the try.diffoscope.org service by addressing a compatibility issue between `glibc`/`seccomp` that was preventing the Docker-contained diffoscope instance from spawning any external processes whatsoever [ ]. I also updated the `requirements.txt` file, as some of the specified packages were no longer available [ ][ ].

In addition Jelle van der Waa added support for `file` version 5.43 [ ] and Mattia Rizzolo updated the packaging:

Also include `coreboot-utils` in the `Build-Depends` and `Test-Depends` fields so that it is available for tests. [ ]

Use pep517 and pip to load the requirements. [ ]

Remove packages in `Breaks`/`Replaces` that have been obsoleted since the release of Debian bullseye. [ ]

Reprotest reprotest is our end-user tool to build the same source code twice in widely and deliberate different environments, and checking whether the binaries produced by the builds have any differences. This month, reprotest version `0.7.22` was uploaded to Debian unstable by Holger Levsen, which included the following changes by Philip Hands:

Actually ensure that the `setarch(8)` utility can actually execute before including an architecture to test. [ ]

Include all files matching `.deb` in the default `artifact_pattern` in order to archive all results of the build. [ ]

Emit an error when building the Debian package if the Debian packaging version does not patch the Python version of reprotest. [ ]

Remove an unneeded invocation of the `head(1)` utility. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann (18 bugs):

`DateTime` (fails to build in 2038)

`FreeRCT` (date-related issue)

`clanlib1` (filesystem ordering)

`cli` (fails to build in 2038)

`deepin-gettext-tools` (patch+version update toolchain sort python glob)

`mariadb` (fails to build in 2038)

`mercurial` (fails to build in 2038)

`mirrormagic` (parallelism-related issue)

`ocaml-extlib` (parallelism-related issue)

`python-xmlrpc/python-softlayer` (fails to build in 2038)

`python` (fails to build in 2038)

`q3rally` (zip-related issue)

`rnd_jue` (parallelism-related issue)

`rsync` (workaround an issue in GCC 7.x)

`scons` (`SOURCE_DATE_EPOCH`-related issue)

`stratagus` (date-related issue)

`triplane` (nondeterminism caused by uninitialised memory)

`tyrquake` (date-related issue)

Chris Lamb:

#1019382 filed against `gnome-online-accounts`.

There was renewed activity on a reproducibility-related bug in the Sphinx documentation tool this month. Originally filed in October 2021 by Chris Lamb, the bug in question relates to contents of the `LANGUAGE` environment variable inconsistently affecting the output of `objects.inv` files.

Jelle van der Waa:

`mp4v2` (date-related issue)

`mm-common` (uid/gid issue)

`aardvark-dns` (date-related issue)

Vagrant Cascadian (70 bugs!):

#1020648 filed against `extrepo-data`.

#1020650 filed against `tmpreaper`.

#1020651 filed against `xmlrpc-epi`.

#1020653 filed against `pal`.

#1020656 filed against `nvram-wakeup`.

#1020657 filed against `netris`.

#1020658 filed against `netpbm-free`.

#1020659 filed against `lookup`.

#1020660 filed against `logtools`.

#1020661 filed against `libid3tag`.

#1020662 filed against `log4cpp`.

#1020665 filed against `libimage-imlib2-perl`.

#1020668 filed against `jnettop`.

#1020670 filed against `gwaei`.

#1020671 filed against `ipfm`.

#1020672 filed against `tarlz`.

#1020673 filed against `w3cam`.

#1020674 filed against `ifstat`.

#1020715 filed against `xserver-xorg-input-joystick`.

#1020719 filed against `chibicc`.

#1020723 filed against `python-omegaconf`.

#1020724 and #1020725 filed against `snapper`.

#1020736 filed against `libreswan`.

#1020743 filed against `pure-ftpd`.

#1020748 filed against `xcolmix`.

#1020749 filed against `gigalomania`.

#1020750 filed against `xjump`.

#1020751 filed against `waili`.

#1020752 filed against `sjeng`.

#1020753 filed against `seqtk`.

#1020754 filed against `shapetools`.

#1020755 filed against `rotter`.

#1020756 filed against `rakarrack`.

#1020757 filed against `rig`.

#1020759 filed against `postal`.

#1020798 filed against `netkit-rsh`.

#1020800 filed against `libapache-mod-evasive`.

#1020804 filed against `paxctl`.

#1020805 filed against `png23d`.

#1020806 filed against `perl-byacc`.

#1020807 filed against `poster`.

#1020808 filed against `powerdebug`.

#1020809 filed against `aespipe`.

#1020810 filed against `aewm++-goodies`.

#1020811 filed against `apache-upload-progress-module`.

#1020812 filed against `ascii2binary`.

#1020813 filed against `bible-kjv`.

#1020814 filed against `dradio`.

#1020815 filed against `libapache2-mod-python`.

#1020816 filed against `tempest-for-eliza`.

#1020817 filed against `aplus-fsf`.

#1020866 filed against `wrapsrv`.

#1020867 filed against `uclibc`.

#1020870 filed against `xppaut`.

#1020872 filed against `xvier`.

#1020873 filed against `xserver-xorg-video-glide`.

#1020875 filed against `z80asm`.

#1020876 filed against `yaskkserv`.

#1020877 filed against `edid-decode`.

#1020878 filed against `dustmite`.

#1020879 filed against `dustmite`.

#1020880 filed against `libapache2-mod-authnz-pam`.

#1020881 filed against `kafs-client`.

#1020882 filed against `yaku-ns`.

#1020884 filed against `bplay`.

#1020886 filed against `chise-base`.

#1020887 filed against `checkpw`.

#1020888 filed against `clamz`.

#1020889 filed against `libapache2-mod-auth-pgsql`.

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. This month, however, the following changes were made:

Holger Levsen:

Add a job to build reprotest from Git [ ] and use the correct Git branch when building it [ ].

Mattia Rizzolo:

Enable syncing of results from building live Debian ISO images. [ ]

Use `scp -p` in order to preserve modification times when syncing live ISO images. [ ]

Apply the shellcheck shell script analysis tool. [ ]

In a build node wrapper script, remove some debugging code which was messing up calling `scp(1)` correctly [ ] and consquently add support to use both `scp -p` and regular `scp` [ ].

Roland Clobus:

Track and handle the case where the Debian archive gets updated between two live image builds. [ ]

Remove a call to `sudo(1)` as it is not (or no longer) required to delete old live-build results. [ ]

Contact As ever, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mailing list: `rb-general@lists.reproducible-builds.org`

3 October 2022

Shirish Agarwal: Death Certificate, Legal Heir, Succession Certificate, and Indian Bureaucracy.

Death Certificate After waiting for almost two, two, and a half months, I finally got mum s death certificate last week. A part of me was saddened as it felt like I was nailing her or putting nails to the coffin or whatever it is, (even though I m an Agarwal) I just felt sad and awful. I was told just get a death certificate and your problems will be over. Some people wanted me to give some amount under the table or something which I didn t want to party of and because of that perhaps it took a month, month and a half more as I came to know later that it had been issued almost a month and a half back. The inflation over the last 8 years of the present Govt. has made the corrupt even more corrupt, all the while projecting and telling others that the others are corrupt. There had been also a few politicians who were caught red-handed but then pieces of evidence & witnesses vanish overnight. I don t really wanna go in that direction as it would make for an unpleasant reading with no solutions at all unless the present Central Govt. goes out.

Intestate and Will I came to know the word Intestate. This was a new word/term for me. A lookup told me that intestate means a person dying without putting a will. That legal term comes from U.K. law. I had read a long long time back that almost all our laws have and were made or taken from U.K. law. IIRC, massive sections of the CRPC Act even today have that colonial legacy. While in its (BJP) manifesto that had been shared with the public at the time of the election, they had shared that they will remove a whole swathe of laws that don t make sense in today s environment. But when hard and good questions were asked, they trimmed a few, modified a few, and left most of them as it is. Sadly, most of the laws that they did modify increased Government control over people instead of decreasing, It s been 8 years and yet we still don t have a Privacy law. They had made something but it was too vague and would have invited suits from day 1 so pretty much on backburner :(. A good insight into what I mean is an article in the Hindu I read a few days back. Once you read that article, I am sure you will have as many questions as I have but sadly no answers. Law is not supposed to be partisan but today it is. I could cite examples from both the U.S. and UK courts about progressive judgments or the way they go about it, but then again when our people think they know better But this again does not help me apart from setting some kind of background of where we are.) I have on this blog also shared how Africans have been setting new records in transparency and they did it almost 5 years back. For those new to the blog, African countries have been now broadcasting proceedings of their SC for almost 5 years now. I noticed it when privacy law was being debated and a few handles that I follow on Twitter and elsewhere had gone and given their submission in their SC. It was fascinating to not only hear but also read about the case from multiple viewpoints. And just to remind people, I am sharing all of this from Pune, Maharashtra which is the second-biggest city in Maharashtra that has something like six million people and probably a million or more transitory students, and casual laborers but then again that doesn t help me other than providing a kind of context to what I m sharing.. Now a couple of years back or more I had asked mum to make a will. If she wanted to bequeath something to somebody else she could do that, had shared about that. There was some article in Indian Express or elsewhere that told people what they should be doing, especially if they had cost the barrier of age 60. Now for reasons best known to her, she refused and now I have to figure out what is the right way to go about doing things.

Twitter Experiences Now before Twitter, a few people had been asking me about having a legal heir certificate, while others are asking about a succession certificate and some claim a Death Certificate is enough. Now I asked the same question on Twitter hoping at the max of 5-10 responses but was overwhelmed by the response. I got something like 50-60 odd replies. Probably, one of the better responses was given by Dr. Paras Jain who shared the following

Answer is qualified Movable assets nothing required Bank LIC flat with society nomination done nothing required except death certificate. However, each will insist on a notarized indemnity bond If the nomination is not done. Depends on whims & fancy of each mind legal heir certificate,+ all Dr. Paras Jain. (cleared up the grammar a little, otherwise, views are of Dr. Paras.) What was interesting for me is that most people just didn t give me advice, many of them also shared their own experiences or what they did or went through. I was surprised to learn e.g. that a succession certificate can take up to 6 months or more. Part of me isn t surprised to learn that as do know we have a huge pendency of cases in High Courts, District Courts leading all the way to the Supreme Court. India Today shared a brief article sharing the same and similar issues. Such delays have become far too common now

Supertech Demolition and Others Over the last couple of months, a number of high-profile demolitions have taken place and in most cases, the loss has been of homebuyers. See for e.g. the case of Supertech. A much more detailed article was penned by Moneylife. There were a few Muslims whose homes were demolished just a couple of months back that were being celebrated, but now just 2-3 days back a politician by the name of Shrikant Tyagi, a BJP leader, his flat was partly demolished and there was a lot of hue and cry. Although we shouldn t be discussing on the basis of religion but legality, somehow the idea has been put that there are two kinds of laws, one for the majority, the other for the minority. And this has been going on for the last 8 odd years, hence you see different reactions to the same incidents instead of similar reactions. In all the cases, no strictures are passed either against the Municipality or against lenders. The most obvious question, let s say for argument s sake, I was a homeowner in Supertech. I bought a flat for say 10 lakhs in 2012. According to the courts, today I am supposed to get 22 lakhs at 12% simple interest for 10 years. Let s say even if the builder was in a position and does honor the order, the homeowner will not get a house in the same area as the circle rate would probably have quadrupled by then at the very least. The circle rate alone might be the above amount. The reason is very simple, a builder buys land on the cheap when there is no development around. His/her/their whole idea is once development happens due to other builders also building flats, the whole area gets developed and they are able to sell the flats at a premium. Even Circle rates get affected as the builder pays below the table and asks the officers of the municipal authority to hike the circle rate every few months. Again, wouldn t go into much depth as the whole thing is rotten to the core. There are many such projects. I have shared Krishnaraj Rao s videos on this blog a few times. I am sure there are a few good men like him. At the end, sadly this is where we are P.S. I haven t shared any book reviews this week as this post itself has become too long. I probably may blog about a couple of books in the next couple of days, till later.

25 September 2022

Sergio Talens-Oliag: Kubernetes Static Content Server

This post describes how I ve put together a simple static content server for kubernetes clusters using a Pod with a persistent volume and multiple containers: an sftp server to manage contents, a web server to publish them with optional access control and another one to run scripts which need access to the volume filesystem. The sftp server runs using MySecureShell, the web server is nginx and the script runner uses the webhook tool to publish endpoints to call them (the calls will come from other Pods that run backend servers or are executed from Jobs or CronJobs).

HistoryThe system was developed because we had a `NodeJS` API with endpoints to upload files and store them on S3 compatible services that were later accessed via HTTPS, but the requirements changed and we needed to be able to publish folders instead of individual files using their original names and apply access restrictions using our API. Thinking about our requirements the use of a regular filesystem to keep the files and folders was a good option, as uploading and serving files is simple. For the upload I decided to use the sftp protocol, mainly because I already had an sftp container image based on mysecureshell prepared; once we settled on that we added sftp support to the API server and configured it to upload the files to our server instead of using S3 buckets. To publish the files we added a nginx container configured to work as a reverse proxy that uses the ngx_http_auth_request_module to validate access to the files (the sub request is configurable, in our deployment we have configured it to call our API to check if the user can access a given URL). Finally we added a third container when we needed to execute some tasks directly on the filesystem (using `kubectl exec` with the existing containers did not seem a good idea, as that is not supported by `CronJobs` objects, for example). The solution we found avoiding the NIH Syndrome (i.e. write our own tool) was to use the webhook tool to provide the endpoints to call the scripts; for now we have three:
one to get the disc usage of a `PATH`,
one to `hardlink` all the files that are identical on the filesystem,
one to copy files and folders from S3 buckets to our filesystem.

Container definitions

mysecureshellThe mysecureshell container can be used to provide an sftp service with multiple users (although the files are owned by the same UID and GID) using standalone containers (launched with docker or podman) or in an orchestration system like kubernetes, as we are going to do here. The image is generated using the following Dockerfile:

ARG ALPINE_VERSION=3.16.2
FROM alpine:$ALPINE_VERSION as builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN apk update &&\
 apk add --no-cache alpine-sdk git musl-dev &&\
 git clone https://github.com/sto/mysecureshell.git &&\
 cd mysecureshell &&\
 ./configure --prefix=/usr --sysconfdir=/etc --mandir=/usr/share/man\
 --localstatedir=/var --with-shutfile=/var/lib/misc/sftp.shut --with-debug=2 &&\
 make all && make install &&\
 rm -rf /var/cache/apk/*
FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
COPY --from=builder /usr/bin/mysecureshell /usr/bin/mysecureshell
COPY --from=builder /usr/bin/sftp-* /usr/bin/
RUN apk update &&\
 apk add --no-cache openssh shadow pwgen &&\
 sed -i -e "s ^.*\(AuthorizedKeysFile\).*$ \1 /etc/ssh/auth_keys/%u "\
 /etc/ssh/sshd_config &&\
 mkdir /etc/ssh/auth_keys &&\
 cat /dev/null > /etc/motd &&\
 add-shell '/usr/bin/mysecureshell' &&\
 rm -rf /var/cache/apk/*
COPY bin/* /usr/local/bin/
COPY etc/sftp_config /etc/ssh/
COPY entrypoint.sh /
EXPOSE 22
VOLUME /sftp
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]

The /etc/sftp_config file is used to configure the mysecureshell server to have all the user homes under /sftp/data, only allow them to see the files under their home directories as if it were at the root of the server and close idle connections after 5m of inactivity:

etc/sftp_config

# Default mysecureshell configuration
<Default>
   # All users will have access their home directory under /sftp/data
   Home /sftp/data/$USER
   # Log to a file inside /sftp/logs/ (only works when the directory exists)
   LogFile /sftp/logs/mysecureshell.log
   # Force users to stay in their home directory
   StayAtHome true
   # Hide Home PATH, it will be shown as /
   VirtualChroot true
   # Hide real file/directory owner (just change displayed permissions)
   DirFakeUser true
   # Hide real file/directory group (just change displayed permissions)
   DirFakeGroup true
   # We do not want users to keep forever their idle connection
   IdleTimeOut 5m
</Default>
# vim: ts=2:sw=2:et

The entrypoint.sh script is the one responsible to prepare the container for the users included on the /secrets/user_pass.txt file (creates the users with their HOME directories under /sftp/data and a /bin/false shell and creates the key files from /secrets/user_keys.txt if available). The script expects a couple of environment variables:

SFTP_UID: UID used to run the daemon and for all the files, it has to be different than 0 (all the files managed by this daemon are going to be owned by the same user and group, even if the remote users are different).
SFTP_GID: GID used to run the daemon and for all the files, it has to be different than 0.

And can use the SSH_PORT and SSH_PARAMS values if present. It also requires the following files (they can be mounted as secrets in kubernetes):

/secrets/host_keys.txt: Text file containing the ssh server keys in mime format; the file is processed using the reformime utility (the one included on busybox) and can be generated using the gen-host-keys script included on the container (it uses ssh-keygen and makemime).
/secrets/user_pass.txt: Text file containing lines of the form username:password_in_clear_text (only the users included on this file are available on the sftp server, in fact in our deployment we use only the scs user for everything).

And optionally can use another one:

/secrets/user_keys.txt: Text file that contains lines of the form username:public_ssh_ed25519_or_rsa_key; the public keys are installed on the server and can be used to log into the sftp server if the username exists on the user_pass.txt file.

The contents of the entrypoint.sh script are:

entrypoint.sh

#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
# Expects SSH_UID & SSH_GID on the environment and uses the value of the
# SSH_PORT & SSH_PARAMS variables if present
# SSH_PARAMS
SSH_PARAMS="-D -e -p $ SSH_PORT:=22  $ SSH_PARAMS "
# Fixed values
# DIRECTORIES
HOME_DIR="/sftp/data"
CONF_FILES_DIR="/secrets"
AUTH_KEYS_PATH="/etc/ssh/auth_keys"
# FILES
HOST_KEYS="$CONF_FILES_DIR/host_keys.txt"
USER_KEYS="$CONF_FILES_DIR/user_keys.txt"
USER_PASS="$CONF_FILES_DIR/user_pass.txt"
USER_SHELL_CMD="/usr/bin/mysecureshell"
# TYPES
HOST_KEY_TYPES="dsa ecdsa ed25519 rsa"
# ---------
# FUNCTIONS
# ---------
# Validate HOST_KEYS, USER_PASS, SFTP_UID and SFTP_GID
_check_environment()  
  # Check the ssh server keys ... we don't boot if we don't have them
  if [ ! -f "$HOST_KEYS" ]; then
    cat <<EOF
We need the host keys on the '$HOST_KEYS' file to proceed.
Call the 'gen-host-keys' script to create and export them on a mime file.
EOF
    exit 1
  fi
  # Check that we have users ... if we don't we can't continue
  if [ ! -f "$USER_PASS" ]; then
    cat <<EOF
We need at least the '$USER_PASS' file to provision users.
Call the 'gen-users-tar' script to create a tar file to create an archive that
contains public and private keys for users, a 'user_keys.txt' with the public
keys of the users and a 'user_pass.txt' file with random passwords for them 
(pass the list of usernames to it).
EOF
    exit 1
  fi
  # Check SFTP_UID
  if [ -z "$SFTP_UID" ]; then
    echo "The 'SFTP_UID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_UID" -eq "0" ]; then
    echo "The 'SFTP_UID' can't be 0, use a different 'UID'"
    exit 1
  fi
  # Check SFTP_GID
  if [ -z "$SFTP_GID" ]; then
    echo "The 'SFTP_GID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_GID" -eq "0" ]; then
    echo "The 'SFTP_GID' can't be 0, use a different 'GID'"
    exit 1
  fi
 
# Adjust ssh host keys
_setup_host_keys()  
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  reformime <"$HOST_KEYS"   ret="1"
  for kt in $HOST_KEY_TYPES; do
    key="ssh_host_$ kt _key"
    pub="ssh_host_$ kt _key.pub"
    if [ ! -f "$key" ]; then
      echo "Missing '$key' file"
      ret="1"
    fi
    if [ ! -f "$pub" ]; then
      echo "Missing '$pub' file"
      ret="1"
    fi
    if [ "$ret" -ne "0" ]; then
      continue
    fi
    cat "$key" >"/etc/ssh/$key"
    chmod 0600 "/etc/ssh/$key"
    chown root:root "/etc/ssh/$key"
    cat "$pub" >"/etc/ssh/$pub"
    chmod 0600 "/etc/ssh/$pub"
    chown root:root "/etc/ssh/$pub"
  done
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
 
# Create users
_setup_user_pass()  
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  [ -d "$HOME_DIR" ]   mkdir "$HOME_DIR"
  # Make sure the data dir can be managed by the sftp user
  chown "$SFTP_UID:$SFTP_GID" "$HOME_DIR"
  # Allow the user (and root) to create directories inside the $HOME_DIR, if
  # we don't allow it the directory creation fails on EFS (AWS)
  chmod 0755 "$HOME_DIR"
  # Create users
  echo "sftp:sftp:$SFTP_UID:$SFTP_GID:::/bin/false" >"newusers.txt"
  sed -n "/^[^#]/   s/:/ /p  " "$USER_PASS"   while read -r _u _p; do
    echo "$_u:$_p:$SFTP_UID:$SFTP_GID::$HOME_DIR/$_u:$USER_SHELL_CMD"
  done >>"newusers.txt"
  newusers --badnames newusers.txt
  # Disable write permission on the directory to forbid remote sftp users to
  # remove their own root dir (they have already done it); we adjust that
  # here to avoid issues with EFS (see before)
  chmod 0555 "$HOME_DIR"
  # Clean up the tmpdir
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
 
# Adjust user keys
_setup_user_keys()  
  if [ -f "$USER_KEYS" ]; then
    sed -n "/^[^#]/   s/:/ /p  " "$USER_KEYS"   while read -r _u _k; do
      echo "$_k" >>"$AUTH_KEYS_PATH/$_u"
    done
  fi
 
# Main function
exec_sshd()  
  _check_environment
  _setup_host_keys
  _setup_user_pass
  _setup_user_keys
  echo "Running: /usr/sbin/sshd $SSH_PARAMS"
  # shellcheck disable=SC2086
  exec /usr/sbin/sshd -D $SSH_PARAMS
 
# ----
# MAIN
# ----
case "$1" in
"server") exec_sshd ;;
*) exec "$@" ;;
esac
# vim: ts=2:sw=2:et

The container also includes a couple of auxiliary scripts, the first one can be used to generate the host_keys.txt file as follows:

$ docker run --rm stodh/mysecureshell gen-host-keys > host_keys.txt

Where the script is as simple as:

bin/gen-host-keys

#!/bin/sh
set -e
# Generate new host keys
ssh-keygen -A >/dev/null
# Replace hostname
sed -i -e 's/@.*$/@mysecureshell/' /etc/ssh/ssh_host_*_key.pub
# Print in mime format (stdout)
makemime /etc/ssh/ssh_host_*
# vim: ts=2:sw=2:et

And there is another script to generate a .tar file that contains auth data for the list of usernames passed to it (the file contains a user_pass.txt file with random passwords for the users, public and private ssh keys for them and the user_keys.txt file that matches the generated keys). To generate a tar file for the user scs we can execute the following:

$ docker run --rm stodh/mysecureshell gen-users-tar scs > /tmp/scs-users.tar

To see the contents and the text inside the user_pass.txt file we can do:

$ tar tvf /tmp/scs-users.tar
-rw-r--r-- root/root        21 2022-09-11 15:55 user_pass.txt
-rw-r--r-- root/root       822 2022-09-11 15:55 user_keys.txt
-rw------- root/root       387 2022-09-11 15:55 id_ed25519-scs
-rw-r--r-- root/root        85 2022-09-11 15:55 id_ed25519-scs.pub
-rw------- root/root      3357 2022-09-11 15:55 id_rsa-scs
-rw------- root/root      3243 2022-09-11 15:55 id_rsa-scs.pem
-rw-r--r-- root/root       729 2022-09-11 15:55 id_rsa-scs.pub
$ tar xfO /tmp/scs-users.tar user_pass.txt
scs:20JertRSX2Eaar4x

The source of the script is:

bin/gen-users-tar

#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
USER_KEYS_FILE="user_keys.txt"
USER_PASS_FILE="user_pass.txt"
# ---------
# MAIN CODE
# ---------
# Generate user passwords and keys, return 1 if no username is received
if [ "$#" -eq "0" ]; then
  return 1
fi
opwd="$(pwd)"
tmpdir="$(mktemp -d)"
cd "$tmpdir"
for u in "$@"; do
  ssh-keygen -q -a 100 -t ed25519 -f "id_ed25519-$u" -C "$u" -N ""
  ssh-keygen -q -a 100 -b 4096 -t rsa -f "id_rsa-$u" -C "$u" -N ""
  # Legacy RSA private key format
  cp -a "id_rsa-$u" "id_rsa-$u.pem"
  ssh-keygen -q -p -m pem -f "id_rsa-$u.pem" -N "" -P "" >/dev/null
  chmod 0600 "id_rsa-$u.pem"
  echo "$u:$(pwgen -s 16 1)" >>"$USER_PASS_FILE"
  echo "$u:$(cat "id_ed25519-$u.pub")" >>"$USER_KEYS_FILE"
  echo "$u:$(cat "id_rsa-$u.pub")" >>"$USER_KEYS_FILE"
done
tar cf - "$USER_PASS_FILE" "$USER_KEYS_FILE" id_* 2>/dev/null
cd "$opwd"
rm -rf "$tmpdir"
# vim: ts=2:sw=2:et

nginx-scsThe nginx-scs container is generated using the following Dockerfile:

ARG NGINX_VERSION=1.23.1
FROM nginx:$NGINX_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN rm -f /docker-entrypoint.d/*
COPY docker-entrypoint.d/* /docker-entrypoint.d/

Basically we are removing the existing docker-entrypoint.d scripts from the standard image and adding a new one that configures the web server as we want using a couple of environment variables:

AUTH_REQUEST_URI: URL to use for the auth_request, if the variable is not found on the environment auth_request is not used.
HTML_ROOT: Base directory of the web server, if not passed the default /usr/share/nginx/html is used.

Note that if we don t pass the variables everything works as if we were using the original nginx image. The contents of the configuration script are:

docker-entrypoint.d/10-update-default-conf.sh

#!/bin/sh
# Replace the default.conf nginx file by our own version.
set -e
if [ -z "$HTML_ROOT" ]; then
  HTML_ROOT="/usr/share/nginx/html"
fi
if [ "$AUTH_REQUEST_URI" ]; then
  cat >/etc/nginx/conf.d/default.conf <<EOF
server  
  listen       80;
  server_name  localhost;
  location /  
    auth_request /.auth;
    root  $HTML_ROOT;
    index index.html index.htm;
   
  location /.auth  
    internal;
    proxy_pass $AUTH_REQUEST_URI;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
    proxy_set_header X-Original-URI \$request_uri;
   
  error_page   500 502 503 504  /50x.html;
  location = /50x.html  
    root /usr/share/nginx/html;
   
 
EOF
else
  cat >/etc/nginx/conf.d/default.conf <<EOF
server  
  listen       80;
  server_name  localhost;
  location /  
    root  $HTML_ROOT;
    index index.html index.htm;
   
  error_page   500 502 503 504  /50x.html;
  location = /50x.html  
    root /usr/share/nginx/html;
   
 
EOF
fi
# vim: ts=2:sw=2:et

As we will see later the idea is to use the /sftp/data or /sftp/data/scs folder as the root of the web published by this container and create an Ingress object to provide access to it outside of our kubernetes cluster.

webhook-scsThe webhook-scs container is generated using the following Dockerfile:

ARG ALPINE_VERSION=3.16.2
ARG GOLANG_VERSION=alpine3.16
FROM golang:$GOLANG_VERSION AS builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
ENV WEBHOOK_VERSION 2.8.0
ENV WEBHOOK_PR 549
ENV S3FS_VERSION v1.91
WORKDIR /go/src/github.com/adnanh/webhook
RUN apk update &&\
 apk add --no-cache -t build-deps curl libc-dev gcc libgcc patch
RUN curl -L --silent -o webhook.tar.gz\
 https://github.com/adnanh/webhook/archive/$ WEBHOOK_VERSION .tar.gz &&\
 tar xzf webhook.tar.gz --strip 1 &&\
 curl -L --silent -o $ WEBHOOK_PR .patch\
 https://patch-diff.githubusercontent.com/raw/adnanh/webhook/pull/$ WEBHOOK_PR .patch &&\
 patch -p1 < $ WEBHOOK_PR .patch &&\
 go get -d && \
 go build -o /usr/local/bin/webhook
WORKDIR /src/s3fs-fuse
RUN apk update &&\
 apk add ca-certificates build-base alpine-sdk libcurl automake autoconf\
 libxml2-dev libressl-dev mailcap fuse-dev curl-dev
RUN curl -L --silent -o s3fs.tar.gz\
 https://github.com/s3fs-fuse/s3fs-fuse/archive/refs/tags/$S3FS_VERSION.tar.gz &&\
 tar xzf s3fs.tar.gz --strip 1 &&\
 ./autogen.sh &&\
 ./configure --prefix=/usr/local &&\
 make -j && \
 make install
FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
WORKDIR /webhook
RUN apk update &&\
 apk add --no-cache ca-certificates mailcap fuse libxml2 libcurl libgcc\
 libstdc++ rsync util-linux-misc &&\
 rm -rf /var/cache/apk/*
COPY --from=builder /usr/local/bin/webhook /usr/local/bin/webhook
COPY --from=builder /usr/local/bin/s3fs /usr/local/bin/s3fs
COPY entrypoint.sh /
COPY hooks/* ./hooks/
EXPOSE 9000
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]

Again, we use a multi-stage build because in production we wanted to support a functionality that is not already on the official versions (streaming the command output as a response instead of waiting until the execution ends); this time we build the image applying the PATCH included on this pull request against a released version of the source instead of creating a fork. The entrypoint.sh script is used to generate the webhook configuration file for the existing hooks using environment variables (basically the WEBHOOK_WORKDIR and the *_TOKEN variables) and launch the webhook service:

entrypoint.sh

#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
WEBHOOK_BIN="$ WEBHOOK_BIN:-/webhook/hooks "
WEBHOOK_YML="$ WEBHOOK_YML:-/webhook/scs.yml "
WEBHOOK_OPTS="$ WEBHOOK_OPTS:--verbose "
# ---------
# FUNCTIONS
# ---------
print_du_yml()  
  cat <<EOF
- id: du
  execute-command: '$WEBHOOK_BIN/du.sh'
  command-working-directory: '$WORKDIR'
  response-headers:
  - name: 'Content-Type'
    value: 'application/json'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-arguments-to-command:
  - source: 'url'
    name: 'path'
  pass-environment-to-command:
  - source: 'string'
    envname: 'OUTPUT_FORMAT'
    name: 'json'
EOF
 
print_hardlink_yml()  
  cat <<EOF
- id: hardlink
  execute-command: '$WEBHOOK_BIN/hardlink.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
EOF
 
print_s3sync_yml()  
  cat <<EOF
- id: s3sync
  execute-command: '$WEBHOOK_BIN/s3sync.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['POST']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-environment-to-command:
  - source: 'payload'
    envname: 'AWS_KEY'
    name: 'aws.key'
  - source: 'payload'
    envname: 'AWS_SECRET_KEY'
    name: 'aws.secret_key'
  - source: 'payload'
    envname: 'S3_BUCKET'
    name: 's3.bucket'
  - source: 'payload'
    envname: 'S3_REGION'
    name: 's3.region'
  - source: 'payload'
    envname: 'S3_PATH'
    name: 's3.path'
  - source: 'payload'
    envname: 'SCS_PATH'
    name: 'scs.path'
  stream-command-output: true
EOF
 
print_token_yml()  
  if [ "$1" ]; then
    cat << EOF
  trigger-rule:
    match:
      type: 'value'
      value: '$1'
      parameter:
        source: 'header'
        name: 'X-Webhook-Token'
EOF
  fi
 
exec_webhook()  
  # Validate WORKDIR
  if [ -z "$WEBHOOK_WORKDIR" ]; then
    echo "Must define the WEBHOOK_WORKDIR variable!" >&2
    exit 1
  fi
  WORKDIR="$(realpath "$WEBHOOK_WORKDIR" 2>/dev/null)"   true
  if [ ! -d "$WORKDIR" ]; then
    echo "The WEBHOOK_WORKDIR '$WEBHOOK_WORKDIR' is not a directory!" >&2
    exit 1
  fi
  # Get TOKENS, if the DU_TOKEN or HARDLINK_TOKEN is defined that is used, if
  # not if the COMMON_TOKEN that is used and in other case no token is checked
  # (that is the default)
  DU_TOKEN="$ DU_TOKEN:-$COMMON_TOKEN "
  HARDLINK_TOKEN="$ HARDLINK_TOKEN:-$COMMON_TOKEN "
  S3_TOKEN="$ S3_TOKEN:-$COMMON_TOKEN "
  # Create webhook configuration
    
    print_du_yml
    print_token_yml "$DU_TOKEN"
    echo ""
    print_hardlink_yml
    print_token_yml "$HARDLINK_TOKEN"
    echo ""
    print_s3sync_yml
    print_token_yml "$S3_TOKEN"
   >"$WEBHOOK_YML"
  # Run the webhook command
  # shellcheck disable=SC2086
  exec webhook -hooks "$WEBHOOK_YML" $WEBHOOK_OPTS
 
# ----
# MAIN
# ----
case "$1" in
"server") exec_webhook ;;
*) exec "$@" ;;
esac

The entrypoint.sh script generates the configuration file for the webhook server calling functions that print a yaml section for each hook and optionally adds rules to validate access to them comparing the value of a X-Webhook-Token header against predefined values. The expected token values are taken from environment variables, we can define a token variable for each hook (DU_TOKEN, HARDLINK_TOKEN or S3_TOKEN) and a fallback value (COMMON_TOKEN); if no token variable is defined for a hook no check is done and everybody can call it. The Hook Definition documentation explains the options you can use for each hook, the ones we have right now do the following:

du: runs on the $WORKDIR directory, passes as first argument to the script the value of the path query parameter and sets the variable OUTPUT_FORMAT to the fixed value json (we use that to print the output of the script in JSON format instead of text).
hardlink: runs on the $WORKDIR directory and takes no parameters.
s3sync: runs on the $WORKDIR directory and sets a lot of environment variables from values read from the JSON encoded payload sent by the caller (all the values must be sent by the caller even if they are assigned an empty value, if they are missing the hook fails without calling the script); we also set the stream-command-output value to true to make the script show its output as it is working (we patched the webhook source to be able to use this option).

The du hook scriptThe du hook script code checks if the argument passed is a directory, computes its size using the du command and prints the results in text format or as a JSON dictionary:

hooks/du.sh

#!/bin/sh
set -e
# Script to print disk usage for a PATH inside the scs folder
# ---------
# FUNCTIONS
# ---------
print_error()  
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo " \"error\":\"$*\" "
  else
    echo "$*" >&2
  fi
  exit 1
 
usage()  
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo " \"error\":\"Pass arguments as '?path=XXX\" "
  else
    echo "Usage: $(basename "$0") PATH" >&2
  fi
  exit 1
 
# ----
# MAIN
# ----
if [ "$#" -eq "0" ]   [ -z "$1" ]; then
  usage
fi
if [ "$1" = "." ]; then
  DU_PATH="./"
else
  DU_PATH="$(find . -name "$1" -mindepth 1 -maxdepth 1)"   true
fi
if [ -z "$DU_PATH" ]   [ ! -d "$DU_PATH/." ]; then
  print_error "The provided PATH ('$1') is not a directory"
fi
# Print disk usage in bytes for the given PATH
OUTPUT="$(du -b -s "$DU_PATH")"
if [ "$OUTPUT_FORMAT" = "json" ]; then
  # Format output as  "path":"PATH","bytes":"BYTES" 
  echo "$OUTPUT"  
    sed -e "s%^\(.*\)\t.*/\(.*\)$% \"path\":\"\2\",\"bytes\":\"\1\" %"  
    tr -d '\n'
else
  # Print du output as is
  echo "$OUTPUT"
fi
# vim: ts=2:sw=2:et:ai:sts=2

The hardlink hook scriptThe hardlink hook script is really simple, it just runs the util-linux version of the hardlink command on its working directory:

hooks/hardlink.sh

#!/bin/sh
hardlink --ignore-time --maximize .

We use that to reduce the size of the stored content; to manage versions of files and folders we keep each version on a separate directory and when one or more files are not changed this script makes them hardlinks to the same file on disc, reducing the space used on disk.

The s3sync hook scriptThe s3sync hook script uses the s3fs tool to mount a bucket and synchronise data between a folder inside the bucket and a directory on the filesystem using rsync; all values needed to execute the task are taken from environment variables:

hooks/s3sync.sh

#!/bin/ash
set -euo pipefail
set -o errexit
set -o errtrace
# Functions
finish()  
  ret="$1"
  echo ""
  echo "Script exit code: $ret"
  exit "$ret"
 
# Check variables
if [ -z "$AWS_KEY" ]   [ -z "$AWS_SECRET_KEY" ]   [ -z "$S3_BUCKET" ]  
  [ -z "$S3_PATH" ]   [ -z "$SCS_PATH" ]; then
  [ "$AWS_KEY" ]   echo "Set the AWS_KEY environment variable"
  [ "$AWS_SECRET_KEY" ]   echo "Set the AWS_SECRET_KEY environment variable"
  [ "$S3_BUCKET" ]   echo "Set the S3_BUCKET environment variable"
  [ "$S3_PATH" ]   echo "Set the S3_PATH environment variable"
  [ "$SCS_PATH" ]   echo "Set the SCS_PATH environment variable"
  finish 1
fi
if [ "$S3_REGION" ] && [ "$S3_REGION" != "us-east-1" ]; then
  EP_URL="endpoint=$S3_REGION,url=https://s3.$S3_REGION.amazonaws.com"
else
  EP_URL="endpoint=us-east-1"
fi
# Prepare working directory
WORK_DIR="$(mktemp -p "$HOME" -d)"
MNT_POINT="$WORK_DIR/s3data"
PASSWD_S3FS="$WORK_DIR/.passwd-s3fs"
# Check the moutpoint
if [ ! -d "$MNT_POINT" ]; then
  mkdir -p "$MNT_POINT"
elif mountpoint "$MNT_POINT"; then
  echo "There is already something mounted on '$MNT_POINT', aborting!"
  finish 1
fi
# Create password file
touch "$PASSWD_S3FS"
chmod 0400 "$PASSWD_S3FS"
echo "$AWS_KEY:$AWS_SECRET_KEY" >"$PASSWD_S3FS"
# Mount s3 bucket as a filesystem
s3fs -o dbglevel=info,retries=5 -o "$EP_URL" -o "passwd_file=$PASSWD_S3FS" \
  "$S3_BUCKET" "$MNT_POINT"
echo "Mounted bucket '$S3_BUCKET' on '$MNT_POINT'"
# Remove the password file, just in case
rm -f "$PASSWD_S3FS"
# Check source PATH
ret="0"
SRC_PATH="$MNT_POINT/$S3_PATH"
if [ ! -d "$SRC_PATH" ]; then
  echo "The S3_PATH '$S3_PATH' can't be found!"
  ret=1
fi
# Compute SCS_UID & SCS_GID (by default based on the working directory owner)
SCS_UID="$ SCS_UID:=$(stat -c "%u" "." 2>/dev/null) "   true
SCS_GID="$ SCS_GID:=$(stat -c "%g" "." 2>/dev/null) "   true
# Check destination PATH
DST_PATH="./$SCS_PATH"
if [ "$ret" -eq "0" ] && [ -d "$DST_PATH" ]; then
  mkdir -p "$DST_PATH"   ret="$?"
fi
# Copy using rsync
if [ "$ret" -eq "0" ]; then
  rsync -rlptv --chown="$SCS_UID:$SCS_GID" --delete --stats \
    "$SRC_PATH/" "$DST_PATH/"   ret="$?"
fi
# Unmount the S3 bucket
umount -f "$MNT_POINT"
echo "Called umount for '$MNT_POINT'"
# Remove mount point dir
rmdir "$MNT_POINT"
# Remove WORK_DIR
rmdir "$WORK_DIR"
# We are done
finish "$ret"
# vim: ts=2:sw=2:et:ai:sts=2

Deployment objectsThe system is deployed as a StatefulSet with one replica. Our production deployment is done on AWS and to be able to scale we use EFS for our PersistenVolume; the idea is that the volume has no size limit, its AccessMode can be set to ReadWriteMany and we can mount it from multiple instances of the Pod without issues, even if they are in different availability zones. For development we use k3d and we are also able to scale the StatefulSet for testing because we use a ReadWriteOnce PVC, but it points to a hostPath that is backed up by a folder that is mounted on all the compute nodes, so in reality Pods in different k3d nodes use the same folder on the host.

secrets.yamlThe secrets file contains the files used by the mysecureshell container that can be generated using kubernetes pods as follows (we are only creating the scs user):

$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-host-keys >"./host_keys.txt"
$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-users-tar scs >"./users.tar"

Once we have the files we can generate the secrets.yaml file as follows:

$ tar xf ./users.tar user_keys.txt user_pass.txt
$ kubectl --dry-run=client -o yaml create secret generic "scs-secret" \
  --from-file="host_keys.txt=host_keys.txt" \
  --from-file="user_keys.txt=user_keys.txt" \
  --from-file="user_pass.txt=user_pass.txt" > ./secrets.yaml

The resulting secrets.yaml will look like the following file (the base64 would match the content of the files, of course):

secrets.yaml

apiVersion: v1
data:
  host_keys.txt: TWlt...
  user_keys.txt: c2Nz...
  user_pass.txt: c2Nz...
kind: Secret
metadata:
  creationTimestamp: null
  name: scs-secret

pvc.yamlThe persistent volume claim for a simple deployment (one with only one instance of the statefulSet) can be as simple as this:

pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

On this definition we don t set the storageClassName to use the default one.

Volumes in our development environment (k3d)In our development deployment we create the following PersistentVolume as required by the Local Persistence Volume Static Provisioner (note that the /volumes/scs-pv has to be created by hand, in our k3d system we mount the same host directory on the /volumes path of all the nodes and create the scs-pv directory by hand before deploying the persistent volume):

k3d-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: scs-pv
  labels:
    app.kubernetes.io/name: scs
spec:
  capacity:
    storage: 8Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  claimRef:
    name: scs-pvc
  storageClassName: local-storage
  local:
    path: /volumes/scs-pv
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
          - k3s

And to make sure that everything works as expected we update the PVC definition to add the right storageClassName:

k3d-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: local-storage

Volumes in our production environment (aws)In the production deployment we don t create the PersistentVolume (we are using the aws-efs-csi-driver which supports Dynamic Provisioning) but we add the storageClassName (we set it to the one mapped to the EFS driver, i.e. efs-sc) and set ReadWriteMany as the accessMode:

efs-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: efs-sc

statefulset.yamlThe definition of the statefulSet is as follows:

statefulset.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: scs
  labels:
    app.kubernetes.io/name: scs
spec:
  serviceName: scs
  replicas: 1
  selector:
    matchLabels:
      app: scs
  template:
    metadata:
      labels:
        app: scs
    spec:
      containers:
      - name: nginx
        image: stodh/nginx-scs:latest
        ports:
        - containerPort: 80
          name: http
        env:
        - name: AUTH_REQUEST_URI
          value: ""
        - name: HTML_ROOT
          value: /sftp/data
        volumeMounts:
        - mountPath: /sftp
          name: scs-datadir
      - name: mysecureshell
        image: stodh/mysecureshell:latest
        ports:
        - containerPort: 22
          name: ssh
        securityContext:
          capabilities:
            add:
            - IPC_OWNER
        env:
        - name: SFTP_UID
          value: '2020'
        - name: SFTP_GID
          value: '2020'
        volumeMounts:
        - mountPath: /secrets
          name: scs-file-secrets
          readOnly: true
        - mountPath: /sftp
          name: scs-datadir
      - name: webhook
        image: stodh/webhook-scs:latest
        securityContext:
          privileged: true
        ports:
        - containerPort: 9000
          name: webhook-http
        env:
        - name: WEBHOOK_WORKDIR
          value: /sftp/data/scs
        volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
        - mountPath: /sftp
          name: scs-datadir
      volumes:
      - name: devfuse
        hostPath:
          path: /dev/fuse
      - name: scs-file-secrets
        secret:
          secretName: scs-secrets
      - name: scs-datadir
        persistentVolumeClaim:
          claimName: scs-pvc

Notes about the containers:

nginx: As this is an example the web server is not using an AUTH_REQUEST_URI and uses the /sftp/data directory as the root of the web (to get to the files uploaded for the scs user we will need to use /scs/ as a prefix on the URLs).
mysecureshell: We are adding the IPC_OWNER capability to the container to be able to use some of the sftp-* commands inside it, but they are not really needed, so adding the capability is optional.
webhook: We are launching this container in privileged mode to be able to use the s3fs-fuse, as it will not work otherwise for now (see this kubernetes issue); if the functionality is not needed the container can be executed with regular privileges; besides, as we are not enabling public access to this service we don t define *_TOKEN variables (if required the values should be read from a Secret object).

Notes about the volumes:

the devfuse volume is only needed if we plan to use the s3fs command on the webhook container, if not we can remove the volume definition and its mounts.

service.yamlTo be able to access the different services on the statefulset we publish the relevant ports using the following Service object:

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: scs-svc
  labels:
    app.kubernetes.io/name: scs
spec:
  ports:
  - name: ssh
    port: 22
    protocol: TCP
    targetPort: 22
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: webhook-http
    port: 9000
    protocol: TCP
    targetPort: 9000
  selector:
    app: scs

ingress.yamlTo download the scs files from the outside we can add an ingress object like the following (the definition is for testing using the localhost name):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: scs-ingress
  labels:
    app.kubernetes.io/name: scs
spec:
  ingressClassName: nginx
  rules:
  - host: 'localhost'
    http:
      paths:
      - path: /scs
        pathType: Prefix
        backend:
          service:
            name: scs-svc
            port:
              number: 80

DeploymentTo deploy the statefulSet we create a namespace and apply the object definitions shown before:

$ kubectl create namespace scs-demo
namespace/scs-demo created
$ kubectl -n scs-demo apply -f secrets.yaml
secret/scs-secrets created
$ kubectl -n scs-demo apply -f pvc.yaml
persistentvolumeclaim/scs-pvc created
$ kubectl -n scs-demo apply -f statefulset.yaml
statefulset.apps/scs created
$ kubectl -n scs-demo apply -f service.yaml
service/scs-svc created
$ kubectl -n scs-demo apply -f ingress.yaml
ingress.networking.k8s.io/scs-ingress created

Once the objects are deployed we can check that all is working using kubectl:

$ kubectl  -n scs-demo get all,secrets,ingress
NAME        READY   STATUS    RESTARTS   AGE
pod/scs-0   3/3     Running   0          24s
NAME            TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)                  AGE
service/scs-svc ClusterIP  10.43.0.47  <none>       22/TCP,80/TCP,9000/TCP   21s

NAME                   READY   AGE
statefulset.apps/scs   1/1     24s
NAME                         TYPE                                  DATA   AGE
secret/default-token-mwcd7   kubernetes.io/service-account-token   3      53s
secret/scs-secrets           Opaque                                3      39s
NAME                                   CLASS  HOSTS      ADDRESS     PORTS   AGE
ingress.networking.k8s.io/scs-ingress  nginx  localhost  172.21.0.5  80      17s

At this point we are ready to use the system.

Usage examples

File uploadsAs previously mentioned in our system the idea is to use the sftp server from other Pods, but to test the system we are going to do a kubectl port-forward and connect to the server using our host client and the password we have generated (it is on the user_pass.txt file, inside the users.tar archive):

$ kubectl -n scs-demo port-forward service/scs-svc 2020:22 &
Forwarding from 127.0.0.1:2020 -> 22
Forwarding from [::1]:2020 -> 22
$ PF_PID=$!
$ sftp -P 2020 scs@127.0.0.1                                                 1
Handling connection for 2020
The authenticity of host '[127.0.0.1]:2020 ([127.0.0.1]:2020)' can't be \
  established.
ED25519 key fingerprint is SHA256:eHNwCnyLcSSuVXXiLKeGraw0FT/4Bb/yjfqTstt+088.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[127.0.0.1]:2020' (ED25519) to the list of known \
  hosts.
scs@127.0.0.1's password: **********
Connected to 127.0.0.1.
sftp> ls -la
drwxr-xr-x    2 sftp     sftp         4096 Sep 25 14:47 .
dr-xr-xr-x    3 sftp     sftp         4096 Sep 25 14:36 ..
sftp> !date -R > /tmp/date.txt                                               2
sftp> put /tmp/date.txt .
Uploading /tmp/date.txt to /date.txt
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt
sftp> ln date.txt date.txt.1                                                 3
sftp> ls -l
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
sftp> put /tmp/date.txt date.txt.2                                           4
Uploading /tmp/date.txt to /date.txt.2
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l                                                                  5
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt.2
sftp> exit
$ kill "$PF_PID"
[1]  + terminated  kubectl -n scs-demo port-forward service/scs-svc 2020:22

We connect to the sftp service on the forwarded port with the scs user.
We put a file we have created on the host on the directory.
We do a hard link of the uploaded file.
We put a second copy of the file we created locally.
On the file list we can see that the two first files have two hardlinks

File retrievalsIf our ingress is configured right we can download the date.txt file from the URL http://localhost/scs/date.txt:
$ curl -s http://localhost/scs/date.txt Sun, 25 Sep 2022 17:21:51 +0200

Use of the webhook containerTo finish this post we are going to show how we can call the hooks directly, from a CronJob and from a Job.
Direct script call (du)In our deployment the direct calls are done from other Pods, to simulate it we are going to do a port-forward and call the script with an existing PATH (the root directory) and a bad one:
$ kubectl -n scs-demo port-forward service/scs-svc 9000:9000 >/dev/null & $ PF_PID=$! $ JSON="$(curl -s "http://localhost:9000/hooks/du?path=.")" $ echo $JSON "path":"","bytes":"4160" $ JSON="$(curl -s "http://localhost:9000/hooks/du?path=foo")" $ echo $JSON "error":"The provided PATH ('foo') is not a directory" $ kill $PF_PID
As we only have files on the base directory we print the disk usage of the . PATH and the output is in json format because we export OUTPUT_FORMAT with the value json on the webhook configuration.

Cronjobs (hardlink)As explained before, the webhook container can be used to run cronjobs; the following one uses an alpine container to call the hardlink script each minute (that setup is for testing, obviously):
webhook-cronjob.yaml
apiVersion: batch/v1 kind: CronJob metadata: name: hardlink labels: cronjob: 'hardlink' spec: schedule: "* */1 * * *" concurrencyPolicy: Replace jobTemplate: spec: template: metadata: labels: cronjob: 'hardlink' spec: containers: - name: hardlink-cronjob image: alpine:latest command: ["wget", "-q", "-O-", "http://scs-svc:9000/hooks/hardlink"] restartPolicy: Never
The following console session shows how we create the object, allow a couple of executions and remove it (in production we keep it running but once a day, not each minute):
$ kubectl -n scs-demo apply -f webhook-cronjob.yaml 1 cronjob.batch/hardlink created $ kubectl -n scs-demo get pods -l "cronjob=hardlink" -w 2 NAME READY STATUS RESTARTS AGE hardlink-27735351-zvpnb 0/1 Pending 0 0s hardlink-27735351-zvpnb 0/1 ContainerCreating 0 0s hardlink-27735351-zvpnb 0/1 Completed 0 2s ^C $ kubectl -n scs-demo logs pod/hardlink-27735351-zvpnb 3 Mode: real Method: sha256 Files: 3 Linked: 1 files Compared: 0 xattrs Compared: 1 files Saved: 32 B Duration: 0.000220 seconds $ sleep 60 $ kubectl -n scs-demo get pods -l "cronjob=hardlink" 4 NAME READY STATUS RESTARTS AGE hardlink-27735351-zvpnb 0/1 Completed 0 83s hardlink-27735352-br5rn 0/1 Completed 0 23s $ kubectl -n scs-demo logs pod/hardlink-27735352-br5rn 5 Mode: real Method: sha256 Files: 3 Linked: 0 files Compared: 0 xattrs Compared: 0 files Saved: 0 B Duration: 0.000070 seconds $ kubectl -n scs-demo delete -f webhook-cronjob.yaml 6 cronjob.batch "hardlink" deleted
This command creates the cronjob object.
This checks the pods with our cronjob label, we interrupt it once we see that the first run has been completed.
With this command we see the output of the execution, as this is the fist execution we see that date.txt.2 has been replaced by a hardlink (the summary does not name the file, but it is the only option knowing the contents from the original upload).
After waiting a little bit we check the pods executed again to get the name of the latest one.
The log now shows that nothing was done.
As this is a demo, we delete the cronjob.

Jobs (s3sync)The following job can be used to synchronise the contents of a directory in a S3 bucket with the SCS Filesystem:
job.yaml
apiVersion: batch/v1 kind: Job metadata: name: s3sync labels: cronjob: 's3sync' spec: template: metadata: labels: cronjob: 's3sync' spec: containers: - name: s3sync-job image: alpine:latest command: - "wget" - "-q" - "--header" - "Content-Type: application/json" - "--post-file" - "/secrets/s3sync.json" - "-O-" - "http://scs-svc:9000/hooks/s3sync" volumeMounts: - mountPath: /secrets name: job-secrets readOnly: true restartPolicy: Never volumes: - name: job-secrets secret: secretName: webhook-job-secrets
The file with parameters for the script must be something like this:
s3sync.json
"aws": "key": "********************", "secret_key": "****************************************" , "s3": "region": "eu-north-1", "bucket": "blogops-test", "path": "test" , "scs": "path": "test"
Once we have both files we can run the Job as follows:
$ kubectl -n scs-demo create secret generic webhook-job-secrets \ 1 --from-file="s3sync.json=s3sync.json" secret/webhook-job-secrets created $ kubectl -n scs-demo apply -f webhook-job.yaml 2 job.batch/s3sync created $ kubectl -n scs-demo get pods -l "cronjob=s3sync" 3 NAME READY STATUS RESTARTS AGE s3sync-zx2cj 0/1 Completed 0 12s $ kubectl -n scs-demo logs s3sync-zx2cj 4 Mounted bucket 's3fs-test' on '/root/tmp.jiOjaF/s3data' sending incremental file list created directory ./test ./ kyso.png Number of files: 2 (reg: 1, dir: 1) Number of created files: 2 (reg: 1, dir: 1) Number of deleted files: 0 Number of regular files transferred: 1 Total file size: 15,075 bytes Total transferred file size: 15,075 bytes Literal data: 15,075 bytes Matched data: 0 bytes File list size: 0 File list generation time: 0.147 seconds File list transfer time: 0.000 seconds Total bytes sent: 15,183 Total bytes received: 74 sent 15,183 bytes received 74 bytes 30,514.00 bytes/sec total size is 15,075 speedup is 0.99 Called umount for '/root/tmp.jiOjaF/s3data' Script exit code: 0 $ kubectl -n scs-demo delete -f webhook-job.yaml 5 job.batch "s3sync" deleted $ kubectl -n scs-demo delete secrets webhook-job-secrets 6 secret "webhook-job-secrets" deleted
Here we create the webhook-job-secrets secret that contains the s3sync.json file.
This command runs the job.
Checking the label cronjob=s3sync we get the Pods executed by the job.
Here we print the logs of the completed job.
Once we are finished we remove the Job.
And also the secret.

Final remarksThis post has been longer than I expected, but I believe it can be useful for someone; in any case, next time I ll try to explain something shorter or will split it into multiple entries.

6 September 2022

Shirish Agarwal: Debian on Phone

History Before I start, the game I was talking about is called Cell To Singularity. Now I haven t gone much in the game as I have shared but think that the Singularity it refers to is the Technological Singularity that people think will happen. Whether that will happen or not is open to debate to one and all. This is going to be a bit long one. Confession Time :- When I was sharing in the blog post, I had no clue that we actually had sessions on it in this year s Debconf. I just saw the schedule yesterday and then came to know. Then I saw Guido s two talks, one at Debconf as well as one as Froscon. In fact, saw the Froscon talk first, and then the one at Debconf. Both the talks are nearly the same except for a thing here or a thing there. Now because I was not there so my understanding and knowledge would be disadvantageously asymmetrical to Guido and others who were there and could talk and share more. Having a Debian mobile or Debian on the mobile could also make Debian more popular and connectable to the masses, one of the things that were not pointed out in the Debian India BOF sadly. At the same time, there are some facts that are not on the table and hence not thought about. Being a B.Com person, I have been following not just the technical but also how the economics work and smartphone penetration in India is pretty low or historically been very low, say around 3-4% while the majority that people use, almost 90-95% of the market uses what are called non-smartphones or dumbphones. Especially during the pandemic and even after that the dumbphones market actually went up while smartphones stagnated and even came down. There is a lot of inventory at most of the dealers that they can t get rid of. From a dealer perspective, it probably makes more sense to buy and sell dumbphones more in number as the turnaround of capital is much faster and easier than for smartphones. I have seen people spend a number of hours and rightly so in order to make their minds up on a smartphone while for a dumbphone, it is a 10-minute thing. Ask around, figure out who is selling at the cheapest, and just buy. Most of these low-end phones are coming from China. In fact, even in the middle and getting even into smartphones, the Chinese are the masters from whom we buy, even as they have occupied Indian territory. In the top five, Samsung comes at number three of four (sharing about Samsung as a fan and having used them.) even though battery times are atrocious, especially with Android 12L. The only hope that most of the smartphone manufacturers have is lowering the sticker prices and hoping that 5G Adoption picks up and that is what they are betting on but that comes with its own share of drawbacks as can be seen.
GNOME, MATE, memory leaks, Payments FWIW, while I do have GNOME and do use a couple of tools from the GNOME stack, I hate GNOME with a passion. I have been a mate user for almost a decade now and really love the simplicity that mate has vis-a-vis GNOME. And with each release, MATE has only become better. So, it would be nice if we can have MATE on the mobile phone. How adaptive the apps might be on the smaller area, I dunno. It would be interesting to find out if and how people are looking at debugging memory leaks on mobile phones. Although finding memory leaks on any platform is good, finding them and fixing them on a mobile phone is pretty much critical as most phones have fixed & relatively small amounts of memory and it is and can get quickly exhausted. One of the things that were asked in the Q&A was about payments. The interesting thing is both UK and India are the same or markedly similar in regard as far as contactless payments being concerned. What most Indians have or use is basically UPI which is basically backed by your bank. Unlike in some other countries where you have a selection of wallets and even temporary/permanent virtual accounts whereby you can minimize your risks in case your mobile gets stolen or something, here we don t have that. There are three digital wallets that I know Paytm Not used (have heard it s creepy, but don t really know), Google pay (Unfortunately, this is the one I use, they bought multiple features, and in the last couple of years have really taken the game away from Paytm but also creepy.). The last one is Samsung Pay (haven t really used it as their find my phone app. always crashes, dunno how it is supposed to work.) But I do find that the apps. are vulnerable. Every day there is some or other news of fraud happening. Previously, only States like Bihar and Jharkhand used to be infamous for cybercrime as a hub, but now even States like Andhra Pradesh have joined and surpassed them :(. People have lost lakhs and crores, this is just a few days back. Some more info. on UPI can be found here and GitHub has a few implementation examples that anybody could look at and run away with it.
Balancing on three things For any new mobile phone to crack the market, it has to balance three things. One, achieve economies of scale. Unless, that is not taken care of or done, however good or bad the product might be, it remains a niche and dies after some time. While Guido shared about Openmoko and N900, one of the more interesting bits from a user perspective at least was the OLPC project. There are many nuances that the short article didn t go through. While I can t say for other countries, at least in India, no education initiative happens without corruption. And perhaps Nicholas s hands were tied while other manufacturers would and could do to achieve their sales targets. In India, it flopped because there was no way for volunteers to buy or get OLPC unless they were part of a school or college. There was some traction in FOSS communities, but that died down once OLPC did the partnership with MS-Windows, and proverbially broke the camel s back. FWIW, I think the idea, the concept, and even the machine were far ahead of their time. The other two legs are support and Warranty Without going into any details, I can share and tell there were quite a few OLPC type attempts using conventional laptops or using Android and FOSS or others or even using one of the mainstream distributions but the problems have always been polishing, training and support. Guido talked about privacy as a winning feature but fails to take into account that people want to know that their privacy isn t being violated. If a mobile phone answers to Hey Google does it mean it was passively gathering, storing, and sending info to third parties, we just don t know. The mobile phone could be part of the right to repair profile while at the same time it can force us to ask many questions about the way things currently are and going to be. Six months down the line all the flagships of all companies are working on being able to take and share through satellites (Satellite Internet) and perhaps maybe a few non-flagships. Of course, if you are going to use a satellite, then you are going to drain that much more quickly. In all and every event there are always gonna be tradeoffs. The Debian-mobile mailing list doesn t seem to have many takers. The latest I could find there is written by Paul Wise. I am in a similar boat (Samsung; SM-M526B; Lahaina; arm64-v8a) v12. It is difficult to know which release would work on your machine, make sure that the building from the source is not tainted and pristine and needs a way to backup and restore if you need to. I even tried installing GNURoot Debian and the Xserver alternative they had shared but was unable to use the touch interface on the fakeroot instance . The system talks about a back key but what back key I have no clue.
Precursor Events Debconf 2023 As far as precursor events are concerned before Debconf 23 in India, all the festivals that we have could be used to showcase Debian. In fact, the ongoing Ganesh Chaturthi would have been the perfect way to showcase Debian and apps. according to the audience. Even the festival of Durga Puja, Diwali etc. can be used. When commercial organizations use the same festivals, why can t we? What perhaps we would need to figure out is the funding part as well as getting permissions from Municipal authorities. One of the things for e.g. that we could do is buy either a permanent 24 monitor or a 34 TV and use that to display Debian and apps. The bigger, the better. Something that we could use day to day and also is used for events. This would require significant amounts of energy so we could approach companies, small businesses and individuals both for volunteering as well as helping out with funding. Somebody asked how we could do online stuff and why it is somewhat boring. What could be done for e.g. instead of 4-5 hrs. of things, break it into manageable 45 minute pieces. 4-5 hrs. is long and is gonna fatigue the best of people. Make it into 45-minute negotiable chunks, and intersphere it with jokes, hacks, anecdotes, and war stories. People do not like or want to be talked down to but rather converse. One of the things that I saw many of the artists do is have shows and limit the audience to 20-24 people on zoom call or whatever videoconferencing system you have and play with them. The passive audience enjoys the play between the standup guy and the crowd he works on, some of them may be known to him personally so he can push that envelope a bit more. The same thing can be applied here. Share the passion, and share why we are doing something. For e.g. you could do smem -t -k less and give a whole talk about how memory is used and freed during a session, how are things different on desktop and ARM as far as memory architecture is concerned (if there is). What is being done on the hardware side, what is on the software side and go on and on. Then share about troubleshooting applications. Valgrind is super slow and makes life hell, is there some better app ? Doesn t matter if you are a front-end or a back-end developer you need to know this and figure out the best way to deal with in your app/program. That would have lot of value. And this is just an e.g. to help trigger more ideas from the community. I am sure others probably have more fun ideas as to what can be done. I am stopping here now otherwise would just go on, till later. Feel free to comment, feedback. Hope it generates some more thinking and excitement on the grey cells.

30 August 2022

John Goerzen: The PC & Internet Revolution in Rural America

Inspired by several others (such as Alex Schroeder s post and Szcze uja s prompt), as well as a desire to get this down for my kids, I figure it s time to write a bit about living through the PC and Internet revolution where I did: outside a tiny town in rural Kansas. And, as I ve been back in that same area for the past 15 years, I reflect some on the challenges that continue to play out. Although the stories from the others were primarily about getting online, I want to start by setting some background. Those of you that didn t grow up in the same era as I did probably never realized that a typical business PC setup might cost $10,000 in today s dollars, for instance. So let me start with the background.
Nothing was easy This story begins in the 1980s. Somewhere around my Kindergarten year of school, around 1985, my parents bought a TRS-80 Color Computer 2 (aka CoCo II). It had 64K of RAM and used a TV for display and sound. This got you the computer. It didn t get you any disk drive or anything, no joysticks (required by a number of games). So whenever the system powered down, or it hung and you had to power cycle it a frequent event you d lose whatever you were doing and would have to re-enter the program, literally by typing it in. The floppy drive for the CoCo II cost more than the computer, and it was quite common for people to buy the computer first and then the floppy drive later when they d saved up the money for that. I particularly want to mention that computers then didn t come with a modem. What would be like buying a laptop or a tablet without wifi today. A modem, which I ll talk about in a bit, was another expensive accessory. To cobble together a system in the 80s that was capable of talking to others with persistent storage (floppy, or hard drive), screen, keyboard, and modem would be quite expensive. Adjusted for inflation, if you re talking a PC-style device (a clone of the IBM PC that ran DOS), this would easily be more expensive than the Macbook Pros of today. Few people back in the 80s had a computer at home. And the portion of those that had even the capability to get online in a meaningful way was even smaller. Eventually my parents bought a PC clone with 640K RAM and dual floppy drives. This was primarily used for my mom s work, but I did my best to take it over whenever possible. It ran DOS and, despite its monochrome screen, was generally a more capable machine than the CoCo II. For instance, it supported lowercase. (I m not even kidding; the CoCo II pretty much didn t.) A while later, they purchased a 32MB hard drive for it what luxury! Just getting a machine to work wasn t easy. Say you d bought a PC, and then bought a hard drive, and a modem. You didn t just plug in the hard drive and it would work. You would have to fight it every step of the way. The BIOS and DOS partition tables of the day used a cylinder/head/sector method of addressing the drive, and various parts of that those addresses had too few bits to work with the big drives of the day above 20MB. So you would have to lie to the BIOS and fdisk in various ways, and sort of work out how to do it for each drive. For each peripheral serial port, sound card (in later years), etc., you d have to set jumpers for DMA and IRQs, hoping not to conflict with anything already in the system. Perhaps you can now start to see why USB and PCI were so welcomed.
Sharing and finding resources Despite the two computers in our home, it wasn t as if software written on one machine just ran on another. A lot of software for PC clones assumed a CGA color display. The monochrome HGC in our PC wasn t particularly compatible. You could find a TSR program to emulate the CGA on the HGC, but it wasn t particularly stable, and there s only so much you can do when a program that assumes color displays on a monitor that can only show black, dark amber, or light amber. So I d periodically get to use other computers most commonly at an office in the evening when it wasn t being used. There were some local computer clubs that my dad took me to periodically. Software was swapped back then; disks copied, shareware exchanged, and so forth. For me, at least, there was no online to download software from, and selling software over the Internet wasn t a thing at all.
Three Different Worlds There were sort of three different worlds of computing experience in the 80s:

Home users. Initially using a wide variety of software from Apple, Commodore, Tandy/RadioShack, etc., but eventually coming to be mostly dominated by IBM PC clones

Small and mid-sized business users. Some of them had larger minicomputers or small mainframes, but most that I had contact with by the early 90s were standardized on DOS-based PCs. More advanced ones had a network running Netware, most commonly. Networking hardware and software was generally too expensive for home users to use in the early days.

Universities and large institutions. These are the places that had the mainframes, the earliest implementations of TCP/IP, the earliest users of UUCP, and so forth.

The difference between the home computing experience and the large institution experience were vast. Not only in terms of dollars the large institution hardware could easily cost anywhere from tens of thousands to millions of dollars but also in terms of sheer resources required (large rooms, enormous power circuits, support staff, etc). Nothing was in common between them; not operating systems, not software, not experience. I was never much aware of the third category until the differences started to collapse in the mid-90s, and even then I only was exposed to it once the collapse was well underway. You might say to me, Well, Google certainly isn t running what I m running at home! And, yes of course, it s different. But fundamentally, most large datacenters are running on x86_64 hardware, with Linux as the operating system, and a TCP/IP network. It s a different scale, obviously, but at a fundamental level, the hardware and operating system stack are pretty similar to what you can readily run at home. Back in the 80s and 90s, this wasn t the case. TCP/IP wasn t even available for DOS or Windows until much later, and when it was, it was a clunky beast that was difficult. One of the things Kevin Driscoll highlights in his book called Modem World see my short post about it is that the history of the Internet we usually receive is focused on case 3: the large institutions. In reality, the Internet was and is literally a network of networks. Gateways to and from Internet existed from all three kinds of users for years, and while TCP/IP ultimately won the battle of the internetworking protocol, the other two streams of users also shaped the Internet as we now know it. Like many, I had no access to the large institution networks, but as I ve been reflecting on my experiences, I ve found a new appreciation for the way that those of us that grew up with primarily home PCs shaped the evolution of today s online world also.
An Era of Scarcity I should take a moment to comment about the cost of software back then. A newspaper article from 1985 comments that WordPerfect, then the most powerful word processing program, sold for $495 (or $219 if you could score a mail order discount). That s $1360/$600 in 2022 money. Other popular software, such as Lotus 1-2-3, was up there as well. If you were to buy a new PC clone in the mid to late 80s, it would often cost $2000 in 1980s dollars. Now add a printer a low-end dot matrix for $300 or a laser for $1500 or even more. A modem: another $300. So the basic system would be $3600, or $9900 in 2022 dollars. If you wanted a nice printer, you re now pushing well over $10,000 in 2022 dollars. You start to see one barrier here, and also why things like shareware and piracy if it was indeed even recognized as such were common in those days. So you can see, from a home computer setup (TRS-80, Commodore C64, Apple ][, etc) to a business-class PC setup was an order of magnitude increase in cost. From there to the high-end minis/mainframes was another order of magnitude (at least!) increase. Eventually there was price pressure on the higher end and things all got better, which is probably why the non-DOS PCs lasted until the early 90s.
Increasing Capabilities My first exposure to computers in school was in the 4th grade, when I would have been about 9. There was a single Apple ][ machine in that room. I primarily remember playing Oregon Trail on it. The next year, the school added a computer lab. Remember, this is a small rural area, so each graduating class might have about 25 people in it; this lab was shared by everyone in the K-8 building. It was full of some flavor of IBM PS/2 machines running DOS and Netware. There was a dedicated computer teacher too, though I think she was a regular teacher that was given somewhat minimal training on computers. We were going to learn typing that year, but I did so well on the very first typing program that we soon worked out that I could do programming instead. I started going to school early these machines were far more powerful than the XT at home and worked on programming projects there. Eventually my parents bought me a Gateway 486SX/25 with a VGA monitor and hard drive. Wow! This was a whole different world. It may have come with Windows 3.0 or 3.1 on it, but I mainly remember running OS/2 on that machine. More on that below.
Programming That CoCo II came with a BASIC interpreter in ROM. It came with a large manual, which served as a BASIC tutorial as well. The BASIC interpreter was also the shell, so literally you could not use the computer without at least a bit of BASIC. Once I had access to a DOS machine, it also had a basic interpreter: GW-BASIC. There was a fair bit of software written in BASIC at the time, but most of the more advanced software wasn t. I wondered how these .EXE and .COM programs were written. I could find vague references to DEBUG.EXE, assemblers, and such. But it wasn t until I got a copy of Turbo Pascal that I was able to do that sort of thing myself. Eventually I got Borland C++ and taught myself C as well. A few years later, I wanted to try writing GUI programs for Windows, and bought Watcom C++ much cheaper than the competition, and it could target Windows, DOS (and I think even OS/2). Notice that, aside from BASIC, none of this was free, and none of it was bundled. You couldn t just download a C compiler, or Python interpreter, or whatnot back then. You had to pay for the ability to write any kind of serious code on the computer you already owned.
The Microsoft Domination Microsoft came to dominate the PC landscape, and then even the computing landscape as a whole. IBM very quickly lost control over the hardware side of PCs as Compaq and others made clones, but Microsoft has managed in varying degrees even to this day to keep a stranglehold on the software, and especially the operating system, side. Yes, there was occasional talk of things like DR-DOS, but by and large the dominant platform came to be the PC, and if you had a PC, you ran DOS (and later Windows) from Microsoft. For awhile, it looked like IBM was going to challenge Microsoft on the operating system front; they had OS/2, and when I switched to it sometime around the version 2.1 era in 1993, it was unquestionably more advanced technically than the consumer-grade Windows from Microsoft at the time. It had Internet support baked in, could run most DOS and Windows programs, and had introduced a replacement for the by-then terrible FAT filesystem: HPFS, in 1988. Microsoft wouldn t introduce a better filesystem for its consumer operating systems until Windows XP in 2001, 13 years later. But more on that story later.
Free Software, Shareware, and Commercial Software I ve covered the high cost of software already. Obviously $500 software wasn t going to sell in the home market. So what did we have? Mainly, these things:

Public domain software. It was free to use, and if implemented in BASIC, probably had source code with it too.

Shareware

Commercial software (some of it from small publishers was a lot cheaper than $500)

Let s talk about shareware. The idea with shareware was that a company would release a useful program, sometimes limited. You were encouraged to register , or pay for, it if you liked it and used it. And, regardless of whether you registered it or not, were told please copy! Sometimes shareware was fully functional, and registering it got you nothing more than printed manuals and an easy conscience (guilt trips for not registering weren t necessarily very subtle). Sometimes unregistered shareware would have a nag screen a delay of a few seconds while they told you to register. Sometimes they d be limited in some way; you d get more features if you registered. With games, it was popular to have a trilogy, and release the first episode inevitably ending with a cliffhanger as shareware, and the subsequent episodes would require registration. In any event, a lot of software people used in the 80s and 90s was shareware. Also pirated commercial software, though in the earlier days of computing, I think some people didn t even know the difference. Notice what s missing: Free Software / FLOSS in the Richard Stallman sense of the word. Stallman lived in the big institution world after all, he worked at MIT and what he was doing with the Free Software Foundation and GNU project beginning in 1983 never really filtered into the DOS/Windows world at the time. I had no awareness of it even existing until into the 90s, when I first started getting some hints of it as a port of gcc became available for OS/2. The Internet was what really brought this home, but I m getting ahead of myself. I want to say again: FLOSS never really entered the DOS and Windows 3.x ecosystems. You d see it make a few inroads here and there in later versions of Windows, and moreso now that Microsoft has been sort of forced to accept it, but still, reflect on its legacy. What is the software market like in Windows compared to Linux, even today? Now it is, finally, time to talk about connectivity!
Getting On-Line What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.
The telephone system By the 80s, there was one communication network that already reached into nearly every home in America: the phone system. Virtually every household (note I don t say every person) was uniquely identified by a 10-digit phone number. You could, at least in theory, call up virtually any other phone in the country and be connected in less than a minute. But I ve got to talk about cost. The way things worked in the USA, you paid a monthly fee for a phone line. Included in that monthly fee was unlimited local calling. What is a local call? That was an extremely complex question. Generally it meant, roughly, calling within your city. But of course, as you deal with things like suburbs and cities growing into each other (eg, the Dallas-Ft. Worth metroplex), things got complicated fast. But let s just say for simplicity you could call others in your city. What about calling people not in your city? That was long distance , and you paid often hugely by the minute for it. Long distance rates were difficult to figure out, but were generally most expensive during business hours and cheapest at night or on weekends. Prices eventually started to come down when competition was introduced for long distance carriers, but even then you often were stuck with a single carrier for long distance calls outside your city but within your state. Anyhow, let s just leave it at this: local calls were virtually free, and long distance calls were extremely expensive.
Getting a modem I remember getting a modem that ran at either 1200bps or 2400bps. Either way, quite slow; you could often read even plain text faster than the modem could display it. But what was a modem? A modem hooked up to a computer with a serial cable, and to the phone system. By the time I got one, modems could automatically dial and answer. You would send a command like ATDT5551212 and it would dial 555-1212. Modems had speakers, because often things wouldn t work right, and the telephone system was oriented around speech, so you could hear what was happening. You d hear it wait for dial tone, then dial, then hopefully the remote end would ring, a modem there would answer, you d hear the screeching of a handshake, and eventually your terminal would say CONNECT 2400. Now your computer was bridged to the other; anything going out your serial port was encoded as sound by your modem and decoded at the other end, and vice-versa. But what, exactly, was the other end? It might have been another person at their computer. Turn on local echo, and you can see what they did. Maybe you d send files to each other. But in my case, the answer was different: PC Magazine.
PC Magazine and CompuServe Starting around 1986 (so I would have been about 6 years old), I got to read PC Magazine. My dad would bring copies that were being discarded at his office home for me to read, and I think eventually bought me a subscription directly. This was not just a standard magazine; it ran something like 350-400 pages an issue, and came out every other week. This thing was a monster. It had reviews of hardware and software, descriptions of upcoming technologies, pages and pages of ads (that often had some degree of being informative to them). And they had sections on programming. Many issues would talk about BASIC or Pascal programming, and there d be a utility in most issues. What do I mean by a utility in most issues ? Did they include a floppy disk with software? No, of course not. There was a literal program listing printed in the magazine. If you wanted the utility, you had to type it in. And a lot of them were written in assembler, so you had to have an assembler. An assembler, of course, was not free and I didn t have one. Or maybe they wrote it in Microsoft C, and I had Borland C, and (of course) they weren t compatible. Sometimes they would list the program sort of in binary: line after line of a BASIC program, with lines like 64, 193, 253, 0, 53, 0, 87 that you would type in for hours, hopefully correctly. Running the BASIC program would, if you got it correct, emit a .COM file that you could then run. They did have a rudimentary checksum system built in, but it wasn t even a CRC, so something like swapping two numbers you d never notice except when the program would mysteriously hang. Eventually they teamed up with CompuServe to offer a limited slice of CompuServe for the purpose of downloading PC Magazine utilities. This was called PC MagNet. I am foggy on the details, but I believe that for a time you could connect to the limited PC MagNet part of CompuServe for free (after the cost of the long-distance call, that is) rather than paying for CompuServe itself (because, OF COURSE, that also charged you per the minute.) So in the early days, I would get special permission from my parents to place a long distance call, and after some nerve-wracking minutes in which we were aware every minute was racking up charges, I could navigate the menus, download what I wanted, and log off immediately. I still, incidentally, mourn what PC Magazine became. As with computing generally, it followed the mass market. It lost its deep technical chops, cut its programming columns, stopped talking about things like how SCSI worked, and so forth. By the time it stopped printing in 2009, it was no longer a square-bound 400-page beheamoth, but rather looked more like a copy of Newsweek, but with less depth.
Continuing with CompuServe CompuServe was a much larger service than just PC MagNet. Eventually, our family got a subscription. It was still an expensive and scarce resource; I d call it only after hours when the long-distance rates were cheapest. Everyone had a numerical username separated by commas; mine was 71510,1421. CompuServe had forums, and files. Eventually I would use TapCIS to queue up things I wanted to do offline, to minimize phone usage online. CompuServe eventually added a gateway to the Internet. For the sum of somewhere around $1 a message, you could send or receive an email from someone with an Internet email address! I remember the thrill of one time, as a kid of probably 11 years, sending a message to one of the editors of PC Magazine and getting a kind, if brief, reply back! But inevitably I had
The Godzilla Phone Bill Yes, one month I became lax in tracking my time online. I ran up my parents phone bill. I don t remember how high, but I remember it was hundreds of dollars, a hefty sum at the time. As I watched Jason Scott s BBS Documentary, I realized how common an experience this was. I think this was the end of CompuServe for me for awhile.
Toll-Free Numbers I lived near a town with a population of 500. Not even IN town, but near town. The calling area included another town with a population of maybe 1500, so all told, there were maybe 2000 people total I could talk to with a local call though far fewer numbers, because remember, telephones were allocated by the household. There was, as far as I know, zero modems that were a local call (aside from one that belonged to a friend I met in around 1992). So basically everything was long-distance. But there was a special feature of the telephone network: toll-free numbers. Normally when calling long-distance, you, the caller, paid the bill. But with a toll-free number, beginning with 1-800, the recipient paid the bill. These numbers almost inevitably belonged to corporations that wanted to make it easy for people to call. Sales and ordering lines, for instance. Some of these companies started to set up modems on toll-free numbers. There were few of these, but they existed, so of course I had to try them! One of them was a company called PennyWise that sold office supplies. They had a toll-free line you could call with a modem to order stuff. Yes, online ordering before the web! I loved office supplies. And, because I lived far from a big city, if the local K-Mart didn t have it, I probably couldn t get it. Of course, the interface was entirely text, but you could search for products and place orders with the modem. I had loads of fun exploring the system, and actually ordered things from them and probably actually saved money doing so. With the first order they shipped a monster full-color catalog. That thing must have been 500 pages, like the Sears catalogs of the day. Every item had a part number, which streamlined ordering through the modem.
Inbound FAXes By the 90s, a number of modems became able to send and receive FAXes as well. For those that don t know, a FAX machine was essentially a special modem. It would scan a page and digitally transmit it over the phone system, where it would at least in the early days be printed out in real time (because the machines didn t have the memory to store an entire page as an image). Eventually, PC modems integrated FAX capabilities. There still wasn t anything useful I could do locally, but there were ways I could get other companies to FAX something to me. I remember two of them. One was for US Robotics. They had an on demand FAX system. You d call up a toll-free number, which was an automated IVR system. You could navigate through it and select various documents of interest to you: spec sheets and the like. You d key in your FAX number, hang up, and US Robotics would call YOU and FAX you the documents you wanted. Yes! I was talking to a computer (of a sorts) at no cost to me! The New York Times also ran a service for awhile called TimesFax. Every day, they would FAX out a page or two of summaries of the day s top stories. This was pretty cool in an era in which I had no other way to access anything from the New York Times. I managed to sign up for TimesFax I have no idea how, anymore and for awhile I would get a daily FAX of their top stories. When my family got its first laser printer, I could them even print these FAXes complete with the gothic New York Times masthead. Wow! (OK, so technically I could print it on a dot-matrix printer also, but graphics on a 9-pin dot matrix is a kind of pain that is a whole other article.)
My own phone line Remember how I discussed that phone lines were allocated per household? This was a problem for a lot of reasons:

Anybody that tried to call my family while I was using my modem would get a busy signal (unable to complete the call)

If anybody in the house picked up the phone while I was using it, that would degrade the quality of the ongoing call and either mess up or disconnect the call in progress. In many cases, that could cancel a file transfer (which wasn t necessarily easy or possible to resume), prompting howls of annoyance from me.

Generally we all had to work around each other

So eventually I found various small jobs and used the money I made to pay for my own phone line and my own long distance costs. Eventually I upgraded to a 28.8Kbps US Robotics Courier modem even! Yes, you heard it right: I got a job and a bank account so I could have a phone line and a faster modem. Uh, isn t that why every teenager gets a job? Now my local friend and I could call each other freely at least on my end (I can t remember if he had his own phone line too). We could exchange files using HS/Link, which had the added benefit of allowing split-screen chat even while a file transfer is in progress. I m sure we spent hours chatting to each other keyboard-to-keyboard while sharing files with each other.
Technology in Schools By this point in the story, we re in the late 80s and early 90s. I m still using PC-style OSs at home; OS/2 in the later years of this period, DOS or maybe a bit of Windows in the earlier years. I mentioned that they let me work on programming at school starting in 5th grade. It was soon apparent that I knew more about computers than anybody on staff, and I started getting pulled out of class to help teachers or administrators with vexing school problems. This continued until I graduated from high school, incidentally often to my enjoyment, and the annoyance of one particular teacher who, I must say, I was fine with annoying in this way. That s not to say that there was institutional support for what I was doing. It was, after all, a small school. Larger schools might have introduced BASIC or maybe Logo in high school. But I had already taught myself BASIC, Pascal, and C by the time I was somewhere around 12 years old. So I wouldn t have had any use for that anyhow. There were programming contests occasionally held in the area. Schools would send teams. My school didn t really send anybody, but I went as an individual. One of them was run by a local college (but for jr. high or high school students. Years later, I met one of the professors that ran it. He remembered me, and that day, better than I did. The programming contest had problems one could solve in BASIC or Logo. I knew nothing about what to expect going into it, but I had lugged my computer and screen along, and asked him, Can I write my solutions in C? He was, apparently, stunned, but said sure, go for it. I took first place that day, leading to some rather confused teams from much larger schools. The Netware network that the school had was, as these generally were, itself isolated. There was no link to the Internet or anything like it. Several schools across three local counties eventually invested in a fiber-optic network linking them together. This built a larger, but still closed, network. Its primary purpose was to allow students to be exposed to a wider variety of classes at high schools. Participating schools had an ITV room , outfitted with cameras and mics. So students at any school could take classes offered over ITV at other schools. For instance, only my school taught German classes, so people at any of those participating schools could take German. It was an early Zoom room. But alongside the TV signal, there was enough bandwidth to run some Netware frames. By about 1995 or so, this let one of the schools purchase some CD-ROM software that was made available on a file server and could be accessed by any participating school. Nice! But Netware was mainly about file and printer sharing; there wasn t even a facility like email, at least not on our deployment.
BBSs My last hop before the Internet was the BBS. A BBS was a computer program, usually ran by a hobbyist like me, on a computer with a modem connected. Callers would call it up, and they d interact with the BBS. Most BBSs had discussion groups like forums and file areas. Some also had games. I, of course, continued to have that most vexing of problems: they were all long-distance. There were some ways to help with that, chiefly QWK and BlueWave. These, somewhat like TapCIS in the CompuServe days, let me download new message posts for reading offline, and queue up my own messages to send later. QWK and BlueWave didn t help with file downloading, though.
BBSs get networked BBSs were an interesting thing. You d call up one, and inevitably somewhere in the file area would be a BBS list. Download the BBS list and you ve suddenly got a list of phone numbers to try calling. All of them were long distance, of course. You d try calling them at random and have a success rate of maybe 20%. The other 80% would be defunct; you might get the dreaded this number is no longer in service or the even more dreaded angry human answering the phone (and of course a modem can t talk to a human, so they d just get silence for probably the nth time that week). The phone company cared nothing about BBSs and recycled their numbers just as fast as any others. To talk to various people, or participate in certain discussion groups, you d have to call specific BBSs. That s annoying enough in the general case, but even more so for someone paying long distance for it all, because it takes a few minutes to establish a connection to a BBS: handshaking, logging in, menu navigation, etc. But BBSs started talking to each other. The earliest successful such effort was FidoNet, and for the duration of the BBS era, it remained by far the largest. FidoNet was analogous to the UUCP that the institutional users had, but ran on the much cheaper PC hardware. Basically, BBSs that participated in FidoNet would relay email, forum posts, and files between themselves overnight. Eventually, as with UUCP, by hopping through this network, messages could reach around the globe, and forums could have worldwide participation asynchronously, long before they could link to each other directly via the Internet. It was almost entirely volunteer-run.
Running my own BBS At age 13, I eventually chose to set up my own BBS. It ran on my single phone line, so of course when I was dialing up something else, nobody could dial up me. Not that this was a huge problem; in my town of 500, I probably had a good 1 or 2 regular callers in the beginning. In the PC era, there was a big difference between a server and a client. Server-class software was expensive and rare. Maybe in later years you had an email client, but an email server would be completely unavailable to you as a home user. But with a BBS, I could effectively run a server. I even ran serial lines in our house so that the BBS could be connected from other rooms! Since I was running OS/2, the BBS didn t tie up the computer; I could continue using it for other things. FidoNet had an Internet email gateway. This one, unlike CompuServe s, was free. Once I had a BBS on FidoNet, you could reach me from the Internet using the FidoNet address. This didn t support attachments, but then email of the day didn t really, either. Various others outside Kansas ran FidoNet distribution points. I believe one of them was mgmtsys; my memory is quite vague, but I think they offered a direct gateway and I would call them to pick up Internet mail via FidoNet protocols, but I m not at all certain of this.
Pros and Cons of the Non-Microsoft World As mentioned, Microsoft was and is the dominant operating system vendor for PCs. But I left that world in 1993, and here, nearly 30 years later, have never really returned. I got an operating system with more technical capabilities than the DOS and Windows of the day, but the tradeoff was a much smaller software ecosystem. OS/2 could run DOS programs, but it ran OS/2 programs a lot better. So if I were to run a BBS, I wanted one that had a native OS/2 version limiting me to a small fraction of available BBS server software. On the other hand, as a fully 32-bit operating system, there started to be OS/2 ports of certain software with a Unix heritage; most notably for me at the time, gcc. At some point, I eventually came across the RMS essays and started to be hooked.
Internet: The Hunt Begins I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?
Learning Link and Gopher ISPs weren t really a thing; the first one in my area (though still a long-distance call) started in, I think, 1994. One service that one of my teachers got me hooked up with was Learning Link. Learning Link was a nationwide collaboration of PBS stations and schools, designed to build on the educational mission of PBS. The nearest Learning Link station was more than a 3-hour drive away but critically, they had a toll-free access number, and my teacher convinced them to let me use it. I connected via a terminal program and a modem, like with most other things. I don t remember much about it, but I do remember a very important thing it had: Gopher. That was my first experience with Gopher. Learning Link was hosted by a Unix derivative (Xenix), but it didn t exactly give everyone a shell. I seem to recall it didn t have open FTP access either. The Gopher client had FTP access at some point; I don t recall for sure if it did then. If it did, then when a Gopher server referred to an FTP server, I could get to it. (I am unclear at this point if I could key in an arbitrary FTP location, or knew how, at that time.) I also had email access there, but I don t recall exactly how; probably Pine. If that s correct, that would have dated my Learning Link access as no earlier than 1992. I think my access time to Learning Link was limited. And, since the only way to get out on the Internet from there was Gopher and Pine, I was somewhat limited in terms of technology as well. I believe that telnet services, for instance, weren t available to me.
Computer labs There was one place that tended to have Internet access: colleges and universities. In 7th grade, I participated in a program that resulted in me being invited to visit Duke University, and in 8th grade, I participated in National History Day, resulting in a trip to visit the University of Maryland. I probably sought out computer labs at both of those. My most distinct memory was finding my way into a computer lab at one of those universities, and it was full of NeXT workstations. I had never seen or used NeXT before, and had no idea how to operate it. I had brought a box of floppy disks, unaware that the DOS disks probably weren t compatible with NeXT. Closer to home, a small college had a computer lab that I could also visit. I would go there in summer or when it wasn t used with my stack of floppies. I remember downloading disk images of FLOSS operating systems: FreeBSD, Slackware, or Debian, at the time. The hash marks from the DOS-based FTP client would creep across the screen as the 1.44MB disk images would slowly download. telnet was also available on those machines, so I could telnet to things like public-access Archie servers and libraries though not Gopher. Still, FTP and telnet access opened up a lot, and I learned quite a bit in those years.
Continuing the Journey At some point, I got a copy of the Whole Internet User s Guide and Catalog, published in 1994. I still have it. If it hadn t already figured it out by then, I certainly became aware from it that Unix was the dominant operating system on the Internet. The examples in Whole Internet covered FTP, telnet, gopher all assuming the user somehow got to a Unix prompt. The web was introduced about 300 pages in; clearly viewed as something that wasn t page 1 material. And it covered the command-line www client before introducing the graphical Mosaic. Even then, though, the book highlighted Mosaic s utility as a front-end for Gopher and FTP, and even the ability to launch telnet sessions by clicking on links. But having a copy of the book didn t equate to having any way to run Mosaic. The machines in the computer lab I mentioned above all ran DOS and were incapable of running a graphical browser. I had no SLIP or PPP (both ways to run Internet traffic over a modem) connectivity at home. In short, the Web was something for the large institutional users at the time.
CD-ROMs As CD-ROMs came out, with their huge (for the day) 650MB capacity, various companies started collecting software that could be downloaded on the Internet and selling it on CD-ROM. The two most popular ones were Walnut Creek CD-ROM and Infomagic. One could buy extensive Shareware and gaming collections, and then even entire Linux and BSD distributions. Although not exactly an Internet service per se, it was a way of bringing what may ordinarily only be accessible to institutional users into the home computer realm.
Free Software Jumps In As I mentioned, by the mid 90s, I had come across RMS s writings about free software most probably his 1992 essay Why Software Should Be Free. (Please note, this is not a commentary on the more recently-revealed issues surrounding RMS, but rather his writings and work as I encountered them in the 90s.) The notion of a Free operating system not just in cost but in openness was incredibly appealing. Not only could I tinker with it to a much greater extent due to having source for everything, but it included so much software that I d otherwise have to pay for. Compilers! Interpreters! Editors! Terminal emulators! And, especially, server software of all sorts. There d be no way I could afford or run Netware, but with a Free Unixy operating system, I could do all that. My interest was obviously piqued. Add to that the fact that I could actually participate and contribute I was about to become hooked on something that I ve stayed hooked on for decades. But then the question was: which Free operating system? Eventually I chose FreeBSD to begin with; that would have been sometime in 1995. I don t recall the exact reasons for that. I remember downloading Slackware install floppies, and probably the fact that Debian wasn t yet at 1.0 scared me off for a time. FreeBSD s fantastic Handbook far better than anything I could find for Linux at the time was no doubt also a factor.
The de Raadt Factor Why not NetBSD or OpenBSD? The short answer is Theo de Raadt. Somewhere in this time, when I was somewhere between 14 and 16 years old, I asked some questions comparing NetBSD to the other two free BSDs. This was on a NetBSD mailing list, but for some reason Theo saw it and got a flame war going, which CC d me. Now keep in mind that even if NetBSD had a web presence at the time, it would have been minimal, and I would have not all that unusually for the time had no way to access it. I was certainly not aware of the, shall we say, acrimony between Theo and NetBSD. While I had certainly seen an online flamewar before, this took on a different and more disturbing tone; months later, Theo randomly emailed me under the subject SLIME saying that I was, well, SLIME . I seem to recall periodic emails from him thereafter reminding me that he hates me and that he had blocked me. (Disclaimer: I have poor email archives from this period, so the full details are lost to me, but I believe I am accurately conveying these events from over 25 years ago) This was a surprise, and an unpleasant one. I was trying to learn, and while it is possible I didn t understand some aspect or other of netiquette (or Theo s personal hatred of NetBSD) at the time, still that is not a reason to flame a 16-year-old (though he would have had no way to know my age). This didn t leave any kind of scar, but did leave a lasting impression; to this day, I am particularly concerned with how FLOSS projects handle poisonous people. Debian, for instance, has come a long way in this over the years, and even Linus Torvalds has turned over a new leaf. I don t know if Theo has. In any case, I didn t use NetBSD then. I did try it periodically in the years since, but never found it compelling enough to justify a large switch from Debian. I never tried OpenBSD for various reasons, but one of them was that I didn t want to join a community that tolerates behavior such as Theo s from its leader.
Moving to FreeBSD Moving from OS/2 to FreeBSD was final. That is, I didn t have enough hard drive space to keep both. I also didn t have the backup capacity to back up OS/2 completely. My BBS, which ran Virtual BBS (and at some point also AdeptXBBS) was deleted and reincarnated in a different form. My BBS was a member of both FidoNet and VirtualNet; the latter was specific to VBBS, and had to be dropped. I believe I may have also had to drop the FidoNet link for a time. This was the biggest change of computing in my life to that point. The earlier experiences hadn t literally destroyed what came before. OS/2 could still run my DOS programs. Its command shell was quite DOS-like. It ran Windows programs. I was going to throw all that away and leap into the unknown. I wish I had saved a copy of my BBS; I would love to see the messages I exchanged back then, or see its menu screens again. I have little memory of what it looked like. But other than that, I have no regrets. Pursuing Free, Unixy operating systems brought me a lot of enjoyment and a good career. That s not to say it was easy. All the problems of not being in the Microsoft ecosystem were magnified under FreeBSD and Linux. In a day before EDID, monitor timings had to be calculated manually and you risked destroying your monitor if you got them wrong. Word processing and spreadsheet software was pretty much not there for FreeBSD or Linux at the time; I was therefore forced to learn LaTeX and actually appreciated that. Software like PageMaker or CorelDraw was certainly nowhere to be found for those free operating systems either. But I got a ton of new capabilities. I mentioned the BBS didn t shut down, and indeed it didn t. I ran what was surely a supremely unique oddity: a free, dialin Unix shell server in the middle of a small town in Kansas. I m sure I provided things such as pine for email and some help text and maybe even printouts for how to use it. The set of callers slowly grew over the time period, in fact. And then I got UUCP.
Enter UUCP Even throughout all this, there was no local Internet provider and things were still long distance. I had Internet Email access via assorted strange routes, but they were all strange. And, I wanted access to Usenet. In 1995, it happened. The local ISP I mentioned offered UUCP access. Though I couldn t afford the dialup shell (or later, SLIP/PPP) that they offered due to long-distance costs, UUCP s very efficient batched processes looked doable. I believe I established that link when I was 15, so in 1995. I worked to register my domain, complete.org, as well. At the time, the process was a bit lengthy and involved downloading a text file form, filling it out in a precise way, sending it to InterNIC, and probably mailing them a check. Well I did that, and in September of 1995, complete.org became mine. I set up sendmail on my local system, as well as INN to handle the limited Usenet newsfeed I requested from the ISP. I even ran Majordomo to host some mailing lists, including some that were surprisingly high-traffic for a few-times-a-day long-distance modem UUCP link! The modem client programs for FreeBSD were somewhat less advanced than for OS/2, but I believe I wound up using Minicom or Seyon to continue to dial out to BBSs and, I believe, continue to use Learning Link. So all the while I was setting up my local BBS, I continued to have access to the text Internet, consisting of chiefly Gopher for me.
Switching to Debian I switched to Debian sometime in 1995 or 1996, and have been using Debian as my primary OS ever since. I continued to offer shell access, but added the WorldVU Atlantis menuing BBS system. This provided a return of a more BBS-like interface (by default; shell was still an uption) as well as some BBS door games such as LoRD and TradeWars 2002, running under DOS emulation. I also continued to run INN, and ran ifgate to allow FidoNet echomail to be presented into INN Usenet-like newsgroups, and netmail to be gated to Unix email. This worked pretty well. The BBS continued to grow in these days, peaking at about two dozen total user accounts, and maybe a dozen regular users.
Dial-up access availability I believe it was in 1996 that dial up PPP access finally became available in my small town. What a thrill! FINALLY! I could now FTP, use Gopher, telnet, and the web all from home. Of course, it was at modem speeds, but still. (Strangely, I have a memory of accessing the Web using WebExplorer from OS/2. I don t know exactly why; it s possible that by this time, I had upgraded to a 486 DX2/66 and was able to reinstall OS/2 on the old 25MHz 486, or maybe something was wrong with the timeline from my memories from 25 years ago above. Or perhaps I made the occasional long-distance call somewhere before I ditched OS/2.) Gopher sites still existed at this point, and I could access them using Netscape Navigator which likely became my standard Gopher client at that point. I don t recall using UMN text-mode gopher client locally at that time, though it s certainly possible I did.
The city Starting when I was 15, I took computer science classes at Wichita State University. The first one was a class in the summer of 1995 on C++. I remember being worried about being good enough for it I was, after all, just after my HS freshman year and had never taken the prerequisite C class. I loved it and got an A! By 1996, I was taking more classes. In 1996 or 1997 I stayed in Wichita during the day due to having more than one class. So, what would I do then but enjoy the computer lab? The CS dept. had two of them: one that had NCD X terminals connected to a pair of SunOS servers, and another one running Windows. I spent most of the time in the Unix lab with the NCDs; I d use Netscape or pine, write code, enjoy the University s fast Internet connection, and so forth. In 1997 I had graduated high school and that summer I moved to Wichita to attend college. As was so often the case, I shut down the BBS at that time. It would be 5 years until I again dealt with Internet at home in a rural community. By the time I moved to my apartment in Wichita, I had stopped using OS/2 entirely. I have no memory of ever having OS/2 there. Along the way, I had bought a Pentium 166, and then the most expensive piece of computing equipment I have ever owned: a DEC Alpha, which, of course, ran Linux.
ISDN I must have used dialup PPP for a time, but I eventually got a job working for the ISP I had used for UUCP, and then PPP. While there, I got a 128Kbps ISDN line installed in my apartment, and they gave me a discount on the service for it. That was around 3x the speed of a modem, and crucially was always on and gave me a public IP. No longer did I have to use UUCP; now I got to host my own things! By at least 1998, I was running a web server on www.complete.org, and I had an FTP server going as well.
Even Bigger Cities In 1999 I moved to Dallas, and there got my first broadband connection: an ADSL link at, I think, 1.5Mbps! Now that was something! But it had some reliability problems. I eventually put together a server and had it hosted at an acquantaince s place who had SDSL in his apartment. Within a couple of years, I had switched to various kinds of proper hosting for it, but that is a whole other article. In Indianapolis, I got a cable modem for the first time, with even tighter speeds but prohibitions on running servers on it. Yuck.
Challenges Being non-Microsoft continued to have challenges. Until the advent of Firefox, a web browser was one of the biggest. While Netscape supported Linux on i386, it didn t support Linux on Alpha. I hobbled along with various attempts at emulators, old versions of Mosaic, and so forth. And, until StarOffice was open-sourced as Open Office, reading Microsoft file formats was also a challenge, though WordPerfect was briefly available for Linux. Over the years, I have become used to the Linux ecosystem. Perhaps I use Gimp instead of Photoshop and digikam instead of well, whatever somebody would use on Windows. But I get ZFS, and containers, and so much that isn t available there. Yes, I know Apple never went away and is a thing, but for most of the time period I discuss in this article, at least after the rise of DOS, it was niche compared to the PC market.
Back to Kansas In 2002, I moved back to Kansas, to a rural home near a different small town in the county next to where I grew up. Over there, it was back to dialup at home, but I had faster access at work. I didn t much care for this, and thus began a 20+-year effort to get broadband in the country. At first, I got a wireless link, which worked well enough in the winter, but had serious problems in the summer when the trees leafed out. Eventually DSL became available locally highly unreliable, but still, it was something. Then I moved back to the community I grew up in, a few miles from where I grew up. Again I got DSL a bit better. But after some years, being at the end of the run of DSL meant I had poor speeds and reliability problems. I eventually switched to various wireless ISPs, which continues to the present day; while people in cities can get Gbps service, I can get, at best, about 50Mbps. Long-distance fees are gone, but the speed disparity remains.
Concluding Reflections I am glad I grew up where I did; the strong community has a lot of advantages I don t have room to discuss here. In a number of very real senses, having no local services made things a lot more difficult than they otherwise would have been. However, perhaps I could say that I also learned a lot through the need to come up with inventive solutions to those challenges. To this day, I think a lot about computing in remote environments: partially because I live in one, and partially because I enjoy visiting places that are remote enough that they have no Internet, phone, or cell service whatsoever. I have written articles like Tools for Communicating Offline and in Difficult Circumstances based on my own personal experience. I instinctively think about making protocols robust in the face of various kinds of connectivity failures because I experience various kinds of connectivity failures myself.
(Almost) Everything Lives On In 2002, Gopher turned 10 years old. It had probably been about 9 or 10 years since I had first used Gopher, which was the first way I got on live Internet from my house. It was hard to believe. By that point, I had an always-on Internet link at home and at work. I had my Alpha, and probably also at least PCMCIA Ethernet for a laptop (many laptops had modems by the 90s also). Despite its popularity in the early 90s, less than 10 years after it came on the scene and started to unify the Internet, it was mostly forgotten. And it was at that moment that I decided to try to resurrect it. The University of Minnesota finally released it under an Open Source license. I wrote the first new gopher server in years, pygopherd, and introduced gopher to Debian. Gopher lives on; there are now quite a few Gopher clients and servers out there, newly started post-2002. The Gemini protocol can be thought of as something akin to Gopher 2.0, and it too has a small but blossoming ecosystem. Archie, the old FTP search tool, is dead though. Same for WAIS and a number of the other pre-web search tools. But still, even FTP lives on today. And BBSs? Well, they didn t go away either. Jason Scott s fabulous BBS documentary looks back at the history of the BBS, while Back to the BBS from last year talks about the modern BBS scene. FidoNet somehow is still alive and kicking. UUCP still has its place and has inspired a whole string of successors. Some, like NNCP, are clearly direct descendents of UUCP. Filespooler lives in that ecosystem, and you can even see UUCP concepts in projects as far afield as Syncthing and Meshtastic. Usenet still exists, and you can now run Usenet over NNCP just as I ran Usenet over UUCP back in the day (which you can still do as well). Telnet, of course, has been largely supplanted by ssh, but the concept is more popular now than ever, as Linux has made ssh be available on everything from Raspberry Pi to Android. And I still run a Gopher server, looking pretty much like it did in 2002. This post also has a permanent home on my website, where it may be periodically updated.

4 August 2022

Reproducible Builds: Reproducible Builds in July 2022

Welcome to the July 2022 report from the Reproducible Builds project! In our reports we attempt to outline the most relevant things that have been going on in the past month. As a brief introduction, the reproducible builds effort is concerned with ensuring no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Reproducible Builds summit 2022 Despite several delays, we are pleased to announce that registration is open for our in-person summit this year: November 1st November 3rd
The event will happen in Venice (Italy). We intend to pick a venue reachable via the train station and an international airport. However, the precise venue will depend on the number of attendees. Please see the announcement email for information about how to register.

Is reproducibility practical? Ludovic Court s published an informative blog post this month asking the important question: Is reproducibility practical?:
Our attention was recently caught by a nice slide deck on the methods and tools for reproducible research in the R programming language. Among those, the talk mentions Guix, stating that it is for professional, sensitive applications that require ultimate reproducibility , which is probably a bit overkill for Reproducible Research . While we were flattered to see Guix suggested as good tool for reproducibility, the very notion that there s a kind of reproducibility that is ultimate and, essentially, impractical, is something that left us wondering: What kind of reproducibility do scientists need, if not the ultimate kind? Is reproducibility practical at all, or is it more of a horizon?
The post goes on to outlines the concept of reproducibility, situating examples within the context of the GNU Guix operating system.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 218, 219 and 220 to Debian, as well as made the following changes:

New features:

Support Haskell 9.x series files. [ ]

Bug fixes:

Fix a regression introduced in version 207 where diffoscope would crash if one directory contained a directory that wasn t in the other. Thanks to Alderico Gallo for the testcase. [ ]

Don t traceback if we encounter an invalid Unicode character in Haskell versioning headers. [ ]

Output improvements:

Improve output of Markdown and reStructuredText to use code blocks with highlighting. [ ]

Codebase improvements:

Space out a file a little. [ ]

Update various copyright years. [ ]

Mailing list On our mailing list this month:

Roland Clobus posted his Eleventh status update about reproducible [Debian] live-build ISO images, noting amongst many other things! that all major desktops build reproducibly with bullseye, bookworm and sid.

Santiago Torres-Arias announced a Call for Papers (CfP) for a new SCORED conference, an academic workshop around software supply chain security . As Santiago highlights, this new conference invites reviewers from industry, open source, governement and academia to review the papers [and] I think that this is super important to tackle the supply chain security task .

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, however, we submitted the following patches:

Bernhard M. Wiedemann

openSUSE monthly report

acarsdec (embeds CPU info with march=native)

casacore (embeds CPU info with march=native)

kubernetes (uses random name of temporary directory)

setuptools/python-brotlicffi (toolchain, filesys/readdir)

sysstat (FTBFS in single CPU mode)

sundials (FTBFS in single CPU mode)

nim (FTBFS in single CPU mode)

doxygen/libzypp (toolchain readdir)

python-pyquil (build failure)

openssl-1_0_0 (build failure)

jsonrpc-glib (FTBFS in single CPU mode)

slurm (Link-Time Optimisation and .tar issues)

wasi-libc (sort the output from find)

Chris Lamb:

#1015245 filed against libshumate.

#1015246 filed against zeal.

#1016186 filed against gappa.

Philip Rinn:

#1014877 filed against cxref.

Vagrant Cascadian:

#1014426 filed against cldump.

#1014428 filed against xmacro.

#1014559 filed against libloki.

#1014560 filed against ygl.

#1014561 filed against clp.

#1014564 filed against tvtime.

#1014789 filed against rdiff-backup-fs.

Reprotest reprotest is the Reproducible Builds project s end-user tool to build the same source code twice in widely and deliberate different environments, and checking whether the binaries produced by the builds have any differences. This month, the following changes were made:

Holger Levsen:

Uploaded version 0.7.21 to Debian unstable as well as mark 0.7.22 development in the repository [ ].

Make diffoscope dependency unversioned as the required version is met even in Debian buster. [ ]

Revert an accidentally committed hunk. [ ]

Mattia Rizzolo:

Apply a patch from Nick Rosbrook to not force the tests to run only against Python 3.9. [ ]

Run the tests through pybuild in order to run them against all supported Python 3.x versions. [ ]

Fix a deprecation warning in the setup.cfg file. [ ]

Close a new Debian bug. [ ]

Reproducible builds website A number of changes were made to the Reproducible Builds website and documentation this month, including:

Arnout Engelen:

Add a link to recent May Contain Hackers 2022 conference talk slides. [ ]

Chris Lamb:

Correct some grammar. [ ]

Holger Levsen:

Add talk from FOSDEM 2015 presented by Holger and Lunar. [ ]

Show date of presentations if we have them. [ ][ ]

Add my presentation from DebConf22 [ ] and from Debian Reunion Hamburg 2022 [ ].

Add dhole to the speakers of the DebConf15 talk. [ ]

Add raboof s talk Reproducible Builds for Trustworthy Binaries from May Contain Hackers. [ ]

Drop some Debian-related suggested ideas which are not really relevant anymore. [ ]

Add a link to list of packages with patches ready to be NMUed. [ ]

Mattia Rizzolo:

Add information about our upcoming event in Venice. [ ][ ][ ][ ]

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, Holger Levsen made the following changes:

Debian-related changes:

Create graphs displaying existing .buildinfo files per each Debian suite/arch. [ ][ ]

Fix a typo in the Debian dashboard. [ ][ ]

Fix some issues in the pkg-r package set definition. [ ][ ][ ]

Improve the builtin-pho HTML output. [ ][ ][ ][ ]

Temporarily disable all live builds as our snapshot mirror is offline. [ ]

Automated node health checks:

Detect dpkg failures. [ ]

Detect files with bad UNIX permissions. [ ]

Relax a regular expression in order to detect Debian Live image build failures. [ ]

Misc changes:

Test that FreeBSD virtual machine has been updated to version 13.1. [ ]

Add a reminder about powercycling the armhf-architecture mst0X node. [ ]

Fix a number of typos. [ ][ ]

Update documentation. [ ][ ]

Fix Munin monitoring configuration for some nodes. [ ]

Fix the static IP address for a node. [ ]

In addition, Vagrant Cascadian updated host keys for the cbxi4pro0 and wbq0 nodes [ ] and, finally, node maintenance was also performed by Mattia Rizzolo [ ] and Holger Levsen [ ][ ][ ].

Contact As ever, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: #reproducible-builds on irc.oftc.net.

Twitter: @ReproBuilds

Mailing list: rb-general@lists.reproducible-builds.org

19 July 2022

Russell Coker: DDC as a KVM Switch

With the recent resurgence in Covid19 I ve been working from home a lot and using both my work laptop and personal PC on the same monitor. HDMI KVM switches start at $150 and I didn t feel like buying one. So I wrote a script to change inputs on my monitor. The following script locks the session on the local machine and switches the monitor s input to the other machine. I ran the command ddcutil vcpinfo grep Input which shows that (on my monitor at least) 60 is the VCP for input. Then I ran the command ddcutil getvcp 60 to get the current value and tried setting values sequentially to find the value for the other port. Below is the script I m using on one system, the other is the same but setting the different port via setvcp. The loginctl command is to lock the screen to prevent accidental keyboard or mouse input from messing anything up.
# lock the session, assumes that seat0 is the only session loginctl lock-session $(loginctl list-sessions grep "seat0 *$" cut -c1-7) # 0xf is DisplayPort, 0x11 is HDMI-1 ddcutil setvcp 60 0x11
For keyboard, mouse, and speakers I m using a USB 2.0 hub that I can switch between computers. I idly considered getting a three-pole double-throw switch (four pole switches aren t available at my local electronic store) to switch USB 2.0 as I only need to switch 3 of the 4 wires. But for the moment just plugging the hub into different systems is enough, I only do that a couple of times a day.
Related posts:

Switches and Cables I ve just read an amusing series of blog posts about...

Load Average Monitoring For my ETBE-Mon [1] monitoring system I recently added a...

how to run dynamic ssh tunnels service smtps disable = no socket_type = stream wait...

Next.

Previous.

Search Results: "tv"

29 December 2022

18 December 2022

8 December 2022

2 December 2022

30 November 2022

16 November 2022

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive: sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

5 November 2022

3 November 2022

1 November 2022

28 October 2022

7 October 2022

Other distributions In openSUSE, Bernhard M. Wiedemann published his usual openSUSE monthly report.

3 October 2022

25 September 2022

File retrievalsIf our ingress is configured right we can download the date.txt file from the URL http://localhost/scs/date.txt: $ curl -s http://localhost/scs/date.txt Sun, 25 Sep 2022 17:21:51 +0200

Final remarksThis post has been longer than I expected, but I believe it can be useful for someone; in any case, next time I ll try to explain something shorter or will split it into multiple entries.

6 September 2022

30 August 2022

Getting On-Line What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.

Internet: The Hunt Begins I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?

Partitioning The above partitioning procedure used `sgdisk`, but I couldn't figure out how to do this with `sgdisk`, so this uses `sfdisk` to dump the partition from the first disk to an external, identical drive:
`sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force`

File retrievalsIf our ingress is configured right we can download the `date.txt` file from the URL http://localhost/scs/date.txt:
`$ curl -s http://localhost/scs/date.txt Sun, 25 Sep 2022 17:21:51 +0200`